Giter Club home page Giter Club logo

msac's Introduction

Molecular Spectrometry Adduct Calculator (MSAC)

The Molecular Spectrometry Adduct Calculator (MSAC) calculates the m/z of potential adducts from given compound m/z. This can help in reading a mass spectrometry spectra. The user provides a .csv with one column containing monoisotopic masses titled 'mass'; other columns will be preserved. Optionally, the user can also supply a more targeted list of adducts to calculate: a .csv with columns 'adduct' and 'charge', where adducts are written without brackets such as M+Na or M-H.

Getting Started

These instructions will get you to run the calculator as a python package.

Prerequisites

Recommended installation is through anaconda, so dependencies are compatible with each other.

Dependencies are pandas, numpy, and molmass.

Using conda:

conda env create -f msac-env.yml
conda activate msac-env

A requirements.txt file is also available.

Installing

In your activated conda environment, install the calculator.

# clone/install
git clone https://github.com/pnnl/msac.git
pip install msac/

# direct
pip install git+https://github.com/pnnl/msac

Running the calculator

MSAC is provided with two adduct lists: a default list of 13 common adducts and an extensive list of adducts found in NIST17. To calculate the m/z for these adducts based on your input masses, pass MSAC a .csv with the compound masses in a column titled 'mass'. Other columns present in the file will be preserved.

msac input.csv

If your input file's monoisotopic mass column is not called 'mass', you can pass in the title as a string.

msac input.csv -m "Monoisotopic Mass"

You can also specify an output file name; default is {input_name}_adducts.csv.

msac input.csv -o my_output_name.csv

If you want to use your own list of adducts, create a csv with a column called 'adduct' and an column of 'charge'. Check example_data/adduct_list_full.csv for an example. The default adduct list included in the distribution is used here.

msac input.csv -f adduct_only_list.csv

If you want to include potential neutral losses, add the -n flag.

msac input.csv -n

If you're using the default adduct list or the neutral loss list with -n, you can limit the number of adducts used based on its prevalence in NIST/GNPS/MassBank. Using a value 1.0 or less will give you all adducts that cover more than the specified percent of NIST/GNPS/MassBank. Using an integer value 2+ gives that number of adducts.

msac input.csv -c 0.75 -o "75th_percentile_adducts.csv"
msac input.csv -c 5 -o "top_5_most_common_adducts.csv"

If you have a column of formulas in your input file, you can have MSAC calculate the input masses for you and/or have it output 'NaN' values when a neutral loss cannot be done due to the number of atoms in the molecule.

# have MSAC calculate the mass using --formula_col
msac input.csv -f "Formula"
# have MSAC restrict losses to those numerically possible using --restrict
msac input.csv -r "Formula"

Authors

  • Madison Blumer

License

This project is licensed under the BSD License - see the LICENSE.md file for details

Acknowledgments

msac's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

msac's Issues

Wrong mass for `2M-H+2NH4`

I suspect there is a problem in the parsing of the ion part of the adduct leading to an error in the computation of the mass for 2M-H+2NH4.

Steps to reproduce, if you run

import pandas as pd 
from msac.calculate_adduct_mz import *

adducts = pd.read_csv('msac/example_data/adduct_list_full.csv')
bug = adducts[adducts.adduct == '2M-H+2NH4']
atom_dict, mass_dict, all_atoms = get_atom_masses(bug)
ions2mass = get_adduct_masses(None, mass_dict, all_atoms)

then ions2mass contains {'-H+2NH4': 29.01397304109} that is incorrect. Looking at intermediate results, mass_dict contains

{'e': 0.00054857990924, 'H': 1.00782503223, 'NH': 15.01089903666}

even if NH mass is correct, in the ion the formula is actually NH4. So the mass in ions2mass corresponds to

- mass_dict['H'] + 2 * mass_dict['NH']

that is a wrong computation. It should be

- Formula('H').monoisotopic_mass + 2 * Formula('NH4').monoisotopic_mass

which leads to 35.060923234469996 instead of 29.01397304109.

The same error happens also in just another case (from adduct_list_full.csv), namely M-H+2H2O where mass_dict again reports just O instead of O2.

I'm working on a complete overhaul of the parsing code that should fix this. As soon as I'll be ready I'll submit a pull request.

Specifying polarity?

Thanks for the great tool! Is there a way to specify the ionization mode in the input file as so to only return pos or neg mode adducts? Or should we just filter them after the fact?

Publish on pypi

Hi, would be cool to have this package on pypi - would it be possible for you to add it there?

example file not provided

I am interested in using your script.
But an example csv file was not provided and I get the following error message:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Simon\.conda\envs\msac-env3\lib\site-packages\msac\example_data\example_compounds_masses.csv

It would also have been nice to know for which Python version your script is working (e.g. Python 3.6).

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.