Batch Correction and Feature Alignment for MZmine

This repository contains scripts and functions for batch correction and feature alignment of mass spectrometry data processed with MZmine. The pipeline includes parsing MGF files, extracting values from MetaboAnalyst CSV files, aligning features across batches, filtering based on cosine similarity, and visualizing mass spectrometry spectra.

Data Requirements:

Process each batch using MZMine accordingly.
Export the aligned feature list for applying the exported data to: (a) GNPS-FBMN (xxxx_quant.csv + xxxx.mgf) (b) MetaboAnalyst (xxxx_MetaboAnalyst.csv)
Name the exported files as follows: xxxx_batch#.mgf, xxxx_batch#.csv, xxxx_batch#_MetaboAnalyst.csv xxxx is the "project name" (e.g., "PHerb1"), and # refers to the batch number.

Setup

Requirements

The required Python packages are listed in the requirements.txt file:

numpy
pandas
plotly
matplotlib
seaborn
pycombat
scikit-learn
pyteomics

Installation

To set up the environment and install the necessary packages, follow these steps:

Clone the repository:

git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name

Ensure that you have Python 3.11 installed on your system. If you don't, adjust the Python version in the run.bat file accordingly.
Run the run.bat file to set up a virtual environment, install the required packages, and launch Jupyter Notebook:
```
./run.bat
```

Usage

Functions

The core functions for processing and analyzing the data are provided in the alignment_functions.py file:

parse_mgf_files(directory_path): Parses MGF files in the specified directory and returns a dictionary of spectra.
extract_values_and_create_dfs(file_paths): Reads _MetaboAnalyst.csv files and creates DataFrames for each file, extracting scan_number, mz_value, and rt_value.
align_features(dfs, mz_threshold=0.01, rt_threshold=0.2): Aligns features across batches based on mz_value and rt_value.
calculate_cosine_similarity(spectrum1, spectrum2): Calculates the cosine similarity between two spectra.
filter_aligned_features(aligned_df, spectra, project_prefix, cosine_threshold=0.9): Filters aligned features based on MS spectra similarity.
plot_ms_spectra(df, feature_batch, spectra, project_prefix): Plots MS spectra for a selected feature using Plotly with vertical lines.
plot_random_ms_spectra(df, spectra, project_prefix): Plots a random MS spectrum using Plotly with vertical lines.

Example

Below is an example of how to use the functions in a Jupyter Notebook:

from alignment_functions import parse_mgf_files, extract_values_and_create_dfs, align_features, filter_aligned_features, plot_ms_spectra, plot_random_ms_spectra

# Define your directory path and project prefix
directory_path = r'C:\Users\borge\Edison_Lab@UGA Dropbox\Ricardo Borges\Projeto_Andrew\Andrew Lab\Projeto herbário\MSMS - Amostras\GNPS_Batches1'
project_prefix = "PHerb1_"

# Parse .mgf files to get spectra
spectra = parse_mgf_files(directory_path)

# Find all _MetaboAnalyst.csv files in the directory
file_paths = [os.path.join(directory_path, file) for file in os.listdir(directory_path) if file.endswith('_MetaboAnalyst.csv')]

# Create separate DataFrames for each file and save first rows
dfs, first_rows = extract_values_and_create_dfs(file_paths)

# Align features across batches with selected mz_threshold and rt_threshold values
mz_threshold = 0.01  # Example value for mz_threshold
rt_threshold = 0.2   # Example value for rt_threshold
aligned_rt_mz_df = align_features(dfs, mz_threshold=mz_threshold, rt_threshold=rt_threshold)

# Filter the aligned features DataFrame with selected cosine_threshold value
cosine_threshold = 0.95  # Example value for cosine_threshold
MSfiltered_aligned_features_df = filter_aligned_features(aligned_rt_mz_df, spectra, project_prefix, cosine_threshold=cosine_threshold)

# Save the filtered DataFrame
output_file_path = os.path.join(directory_path, 'alignment_info_df', 'MSfiltered_aligned_features_df.csv')
MSfiltered_aligned_features_df.to_csv(output_file_path, index=False)

# Plot MS spectra for a specific feature
feature_batch = '48_PHerb1_batch1'
plot_ms_spectra(MSfiltered_aligned_features_df, feature_batch, spectra, project_prefix)

# Plot random MS spectra
plot_random_ms_spectra(MSfiltered_aligned_features_df, spectra, project_prefix)

ricardomborges / batch_aligner Goto Github PK

batch_aligner's Introduction

Batch Correction and Feature Alignment for MZmine

Data Requirements:

Setup

Requirements

Installation

Usage

Functions

Example

batch_aligner's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent