Giter Club home page Giter Club logo

promor's Introduction

promor

Proteomics Data Analysis and Modeling Tools

CRAN status CRAN RStudio mirror downloads CRAN RStudio mirror downloads R-CMD-check test-coverage License: LGPL v2.1

  • promor is a user-friendly, comprehensive R package that combines proteomics data analysis with machine learning-based modeling.

  • promor streamlines differential expression analysis of label-free quantification (LFQ) proteomics data and building predictive models with top protein candidates.

  • With promor we provide a range of quality control and visualization tools to analyze label-free proteomics data at the protein level.

  • Input files for promor are a proteinGroups.txt file produced by MaxQuant or a standard input file containing a quantitative matrix of protein intensities and an expDesign.txt file containing the experimental design of your proteomics data.

  • The standard input file should be a tab-delimited text file. Proteins or protein groups should be indicated by rows and samples by columns. Protein names should be listed in the first column and you may use a column name of your choice for the first column. The remaining sample column names should match the sample names indicated by the mq_label column in the expDesign.txt file.

🚨Check out our R Shiny app: PROMOR App


Installation

Install the released version from CRAN

install.packages("promor")

Install development version from GitHub

# install devtools, if you haven't already:
install.packages("devtools")

# install promor from github
devtools::install_github("caranathunge/promor")

Proteomics data analysis with promor

promor prot analysis flow chart by caranathunge Figure 1. A schematic diagram of suggested workflows for proteomics data analysis with promor.

Example

Here is a minimal working example showing how to identify differentially expressed proteins between two conditions using promor in five simple steps. We use a previously published data set from Cox et al. (2014) (PRIDE ID: PXD000279).

# Load promor
library(promor)

# Create a raw_df object with the files provided in this github account.
raw <- create_df(
  prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
  exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"
)

# Filter out proteins with high levels of missing data in either condition or group
raw_filtered <- filterbygroup_na(raw)

# Impute missing data and create an imp_df object.
imp_df <- impute_na(raw_filtered)

# Normalize data and create a norm_df object
norm_df <- normalize_data(imp_df)

# Perform differential expression analysis and create a fit_df object
fit_df <- find_dep(norm_df)

Lets take a look at the results using a volcano plot.

volcano_plot(fit_df, text_size = 5)


Modeling with promor

promor flowchart-modeling by caranathunge Figure 2. A schematic diagram of suggested workflows for building predictive models with promor.

Example

The following minimal working example shows you how to use your results from differential expression analysis to build machine learning-based predictive models using promor.

We use a previously published data set from Suvarna et al. (2021) that used differentially expressed proteins between severe and non-severe COVID patients to build models to predict COVID severity.

# First, let's make a model_df object of top differentially expressed proteins.
# We will be using example fit_df and norm_df objects provided with the package.
covid_model_df <- pre_process(
  fit_df = covid_fit_df,
  norm_df = covid_norm_df
)

# Next, we split the data into training and test data sets
covid_split_df <- split_data(model_df = covid_model_df)

# Let's train our models using the default list of machine learning algorithms
covid_model_list <- train_models(split_df = covid_split_df)

# We can now use our models to predict the test data
covid_prob_list <- test_models(
  model_list = covid_model_list,
  split_df = covid_split_df
)

Let’s make ROC plots to check how the different models performed.

roc_plot(
  probability_list = covid_prob_list,
  split_df = covid_split_df
)


Tutorials

You can choose a tutorial from the list below that best fits your experiment and the structure of your proteomics data.

  1. This README file can be accessed from RStudio as follows,
vignette("intro_to_promor", package = "promor")
  1. If your data do NOT contain technical replicates: promor: No technical replicates

  2. If your data contain technical replicates: promor: Technical replicates

  3. If you would like to use your proteomics data to build predictive models: promor: Modeling

promor's People

Contributors

caranathunge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

promor's Issues

Is there a way to check for the correlation among biological replicates?

I saw there exists a function to visualize the correlation between technical replicates, but can this also be applied for biological replicates? I, for example, have the impression that one of the biological replicates is a bit of an outlier. Is it also possible to then discard that replicate for the differential analysis?

Incorrect filtering of unique peptides

In the help page for create_df, the option uniq_pep is defined by:

Numerical. The minimum number of unique peptides required to identify a protein (default is 2). Proteins that are identified by less than this number of unique peptides are filtered out. only applies when input_type = "MaxQuant".

However, in line 244 of create_df.R the df is filtered to rows with > uniq_peps, i.e. if uniq_peps is set to the default of two, only proteins identified by 3 or more are kept.

transposed data matrix in impute_na

I just noticed that inside your function promor::impute_na() the RF, kNN, and SVD function are actually expecting the features in columns and sample in rows. That means the matrix df should be transposed before calling them. I am not sure if that would cause any different results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.