Giter Club home page Giter Club logo

glycomparect's Introduction

GlyCompareCT

GlyCompareCT is a Python-based command-line tool available at https://github.com/yuz682/GlyCompareCT. The command-line implementation wraps the existing python package (GlyCompare v1.1.3 https://github.com/LewisLabUCSD/GlyCompare) to increase accessibility by simplifying the user interface. A conda environment yml file is provided for stable installation. Executable files are also available on Zenodo (https://doi.org/10.5281/zenodo.6370789) for Windows (tested on Windows 10, Core i7), Linux (tested on 18.04.6 LTS and CentOS Linux 7 Core), and Mac OS with Intel chip (macOS 12.1, Core i7) and M1 chip.

Mandatory inputs include a glycan abundance table (absolute or relative abundance with rows/columns as samples/glycans; -a <path/to/abundance>) and a glycan annotation table (-v <path/to/annotation>); both in CSV format. GlyCompareCT decomposes glycans to substructures, calculates substructure abundance and identifies a minimal set of glycomotifs.

GlyCompareCT outputs the glycomotif abundance table. The glycomotif abundance table denotes the abundance of the glycomotifs extracted from input glycoprofiles. Rows represent glycomotifs written as <[S/L]i> where S or L denote the structural and linkage-specific references respectfully and i indicates the index in the local reference glycomotif vector (GlyCompareCT/reference if using the python script; glyCompareCT_exe_/reference if using executables). Note that local references will be amended to include previously un-indexed substructures; the github reference will be versioned by date and updated occasionally to integrate new substructures. Column names correspond to glycoprofile names, consistent with the input glycan abundance table.

Citation

Bao, Bokan, Benjamin P. Kellman, Austin WT Chiang, Yujie Zhang, James T. Sorrentino, Austin K. York, Mahmoud A. Mohammad, Morey W. Haymond, Lars Bode, and Nathan E. Lewis. "Correcting for sparsity and interdependence in glycomics by accounting for glycan biosynthesis." Nature communications 12, no. 1 (2021): 1-14. https://doi.org/10.1038/s41467-021-25183-5

Installation

First, please make sure you have conda installed. Version recommendation: conda 4.9.2 and later versions.

Please git clone the main branch to your target local directory.

# get the repo
git clone https://github.com/yuz682/GlyCompareCT.git
# enter the repo
cd GlyCompareCT

All dependencies required to run GlyCompareCT can be installed using environment.yml. A new conda environment is created with all dependencies installed. This step will take a while (10 - 15 minutes).

# Create the environment with all required dependencies installed.
conda env create -f environment.yml

Activate the new environment glycompareCT. Then the preprocessing is all done.

# Activate conda environment
conda activate glycompareCT

Executables

Executables for Window, MacIntel, and Linux can be downloaded from the release or zenodo. The binary file is glyCompareCT (or glyCompareCT.exe). To use more conveniently, you can export the path to PATH variable by

export PATH="<path>/<to>/<glyCompareCT>/<directory>":$PATH

then

source ~/.bashrc

User manual

Please refer to the GlyCompare wiki regarding input file format and more details about input parameters. Please ignore some inconsistent wording as the wiki was written for a web app.

Quick start

Retreive example data

git clone https://github.com/LewisLabUCSD/GlyCompare.git 

Glycopare decomposition of structural, linkage-specific HMO data with no normalization, 2 cores, integer substructure counting, epitope-based motif extraction

python glyCompareCT.py structure \
  -a GlyCompare/example_data/paper_hmo/source_data/abundance_table.csv \
  -v GlyCompare/example_data/paper_hmo/source_data/annotation.csv \
  -o output_hmo/ -p glycoCT -c 2 \

Glycopare decomposition of structural, linkage-specific HMO data with Probabilistic Quotient normalization, 2 cores, binary substructure counting, lactose-based motif extraction

python glyCompareCT.py structure \
  -a GlyCompare/example_data/paper_hmo/source_data/abundance_table.csv \
  -v GlyCompare/example_data/paper_hmo/source_data/annotation.csv \
  -o output_hmo/ -p glycoCT -n prob_quot \
  -m binary -c 2 -r lactose

Naive samples

Simple simulated samples can be retrieved from GlyCompareCT/Naive samples/. There are 4 pairs of test samples.

cd Naive\ samples/

python glyCompareCT.py structure \
  -a test1_abd.csv \
  -v test1_var.csv \
  -o test1 -p glycoCT -b \
  -m integer -c 2 

Inputs:

Outputs:

Table annotation

Annotation format will update the Glytoucan ID column in the previously generated motif annotation table or table with the same format.

python glyCompareCT.py annotate -n <ANNOTATION TABLE>

Structure data

python glyCompareCT.py structure -a <ABUNDANCE TABLE> -v <GLYCAN ANNOTATION> 
-o <OUTPUT_DIRECTORY> -p <GLYCAN_DATA_TYPE> [-n <NORMALIZATION_MODE>, 
-m <SUBSTRUCTURE_ABUNDANCE_MULTIPLIER>, -c <NUMBER_OF_CORES>, -r <ROOT>, 
-cr <CUSTOM_ROOT>, -d, -s, -b, -i]

Required arguments:

Parameter Description
-a, --abundance The file directory to the abundance table, in csv format
-v, --var_annot The file directory to the glycan annotation table, in csv format
-o, --output The directory to save the outputs, folder
-p, --syntax Glycan data type, choose from <'glycoCT', 'iupac_extended', 'linear_code', 'wurcs', 'glytoucan_id'>

Optional arguments:

Parameter Default Description
-e, --share 'private' Either run locally or register the output motif structures to Glytoucan. Choose from <'private', 'register'>.
'private': run GlyCompareCT locally without fetching glytoucan ID and register output motifs to Glytoucan.
'register': Fetch glytoucan ID to output motif annotation table and register any output motifs without glytoucan ID to Glytoucan. Needs to specify Glytoucan contributor ID and API_key.
-C, --Contributor_ID '' User's Glytoucan contributor ID. Can be retrieved at Glytoucan after signing up. Required in -e register mode.
-A, --API_key '' User's Glytoucan API key. Can be retrieved at Glytoucan after signing up. Required in -e register mode.
-s, --no_linkage None Add this parameter if the input glycans don't have linkage information. The default assumes linkage information inclusion.
-c, --core 1 The number of cores to use
-n, --norm 'none' Input glycans normalization within each glycoprofile, choose from <'none', 'min-max', 'prob-quot'>.
'none': no normalization;
'min-max': each element x is set to (x - min) / (max - min);
'prob-quot': A commonly seen normalization method in biological data described in Dieterle et al. 2006
-b, --no_sub_norm None Add this parameter to keep the absolute value of the substructure abundance. If not set, the substructure will be normalized by sum.
-m, --multiplier 'integer' Substructure abundance multiplier, choose from <'binary', 'integer'>.
'binary': 1 if the substructure exists in the glycan, 0 if not;
'integer': the occurrence of the substructure in the glycan.
-r, --root 'epitope' The root substructure of the substructure network, choose from <'epitope', 'N', 'O', 'lactose', 'custom'>.
"epitope": run every possible monosaccharide is a root;
'N': the root for N-glycan, GlcNAc;
'O': the root for O-glycan, GalNAc;
'lactose': set the root as lactose, Gal(b1-4)Glc;
'custom': set custom root. You need to write your custom root in glycoCT format to a txt file and specify the file directory in -cr.
-cr, --custom_root '' The file directory to the txt file containing the custom root in glycoCT format. Only specify this if -r is set to 'custom'.
-d, --heatmap None Add this parameter if you want to draw the cluster map based on the output motif abundance table.
-i, --ignore None Add this parameter if you want to ignore unrecognized glycan structures and proceed the rest.

Composition data

python glyCompareCT.py composition -a <ABUNDANCE TABLE> -v <GLYCAN ANNOTATION> 
-o <OUTPUT_DIRECTORY> [-n <NORMALIZATION_MODE>, -i]

Required arguments:

Parameter Description
-a, --abundance The file directory to the abundance table, in csv format
-v, --var_annot The file directory to the glycan annotation table, in csv format
-o, --output The directory to save the outputs, folder

Optional arguments:

Parameter Default Description
-n, --norm 'none' Input glycans normalization within each glycoprofile, choose from <'none', 'min-max', 'prob-quot'>.
'none': no normalization;
'min-max': each element x
-i, --ignore None Add this parameter if you want to ignore unrecognized glycan compositions and proceed the rest.

glycomparect's People

Contributors

bkellman avatar

Stargazers

Yujie Zhang avatar Bokan Bao avatar

Watchers

James Cloos avatar  avatar Austin avatar Kostas Georgiou avatar Yujie Zhang avatar

Forkers

yxmaograce

glycomparect's Issues

TypeError when adding the

I was using the GlyCompareCT executable, and got the error below:

Creating motif abundance table...
the glycan core is
Traceback (most recent call last):
  File "glyCompareCT.py", line 670, in <module>
  File "glyCompareCT.py", line 57, in main
  File "glyCompareCT.py", line 349, in structure
  File "glycompare/pipeline_functions.py", line 770, in select_motifs_pip
  File "glycompare/select_motifs.py", line 266, in __init__
TypeError: __init__() missing 1 required positional argument: 'num_processors'
[58627] Failed to execute script 'glyCompareCT' due to unhandled exception!

This error only appears when adding the "-d" argument. It happens for both glycompareCT.py and glycompareCT executable.

More information for your reference:

  • Device: MacBook Air M1, 2020
  • System: MacOS Ventura 13.0.1
  • Python version (for glycompareCT.py): 3.10.6

Issues with running example data sets

Hi,
I am attempting to get GlyCompareCT up and running, however, I run into issues in analyzing the example data sets. Here is my most recent error:

(glycompareCT) C:\Users\seaba\GlyCompareCT\Examples>python ../glyCompareCT.py structure -a test1_abd.csv -v test1_var.csv -o test1/ -p glycoCT -b -m integer -c 2 Validating input files... Traceback (most recent call last): File "C:\Users\seaba\GlyCompareCT\glyCompareCT.py", line 685, in <module> main() File "C:\Users\seaba\GlyCompareCT\glyCompareCT.py", line 54, in main input_validation(args) File "C:\Users\seaba\GlyCompareCT\glyCompareCT.py", line 103, in input_validation assert os.path.isdir(os.sep.join(args.output_directory.split(os.sep)[:-1])), "Invalid output path, check the path: " + os.sep.join(args.output_directory.split(os.sep)[:-1]) AssertionError: Invalid output path, check the path:
Note a directory to test1 has been created but this error is still returned.
My current directory is

`
(glycompareCT) C:\Users\seaba\GlyCompareCT\Examples>dir
Volume in drive C is Windows
Volume Serial Number is A414-7917

Directory of C:\Users\seaba\GlyCompareCT\Examples

2024-05-03 17:28

.
2024-05-03 17:29 ..
2024-05-03 17:14 8 196 .DS_Store
2024-05-03 17:14 Jin2017
2024-05-03 17:28 test1
2024-05-03 17:14 28 test1_abd.csv
2024-05-03 17:14 2 084 test1_var.csv
2024-05-03 17:14 28 test2_abd.csv
2024-05-03 17:14 2 084 test2_var.csv
2024-05-03 17:14 42 test3_abd.csv
2024-05-03 17:14 2 084 test3_var.csv
2024-05-03 17:14 28 test4_abd.csv
2024-05-03 17:14 2 084 test4_var.csv
9 File(s) 16 658 bytes
4 Dir(s) 344 891 121 664 bytes free
`

Current conda version is 24.4.0 and current python version is 3.10.14.final.0
platform : win-64
Any help would be greatly appreciated, thanks.

Silent run fail

I tried running glycompare on the glyctoucan IDs in glygen and the code failed. I think it failed shortly after the input validation. There is no indication of why it failed so I cannot debug.
see: https://github.com/bkellman/SubstrID
Execution details: System Ubuntu v20, Execution: CT function, not the executable

Can we add a log output. Even if it doesn't have reasons for failure, it can still indicate which steps were triggered before the code halted.

@yuz682 can you please have a look at the repo linked above and see if you can figure out why it wouldn't run.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.