Giter Club home page Giter Club logo

asdhbtucker's Introduction

asdHBTucker

Conditional Hierarchical Bayesian Tucker Decomposition

Author: Adam Sandler

Date: 3/17/20

Instructions

Tensor decompsition (without CV):

Required data files: asdSparse.csv or cancerSparse.csv
Required packages: Sandia NL Tensor Toolbox

1. Compile appropriate MEX file (sample code provided but commented out in lines 5-6)
Note: parallel verion requires OpenMP
2. Adjust any settings in asdTensCV.m (see init_options.m for more info)
Note: for pre-processing of genes, uncomment out (line 2):
    asd=asdGeneSelect(asdSparse, .1);
and comment out (line 3):
    asd=sptensor(asdSparse(:,1:3),asdSparse(:,4));
3. Run asdTens.m
4. Output will be in asdHBTucker.mat

Tensor decompsition (with CV):

Required data files: asdSparse.csv or cancerSparse.csv
Required packages: Sandia NL Tensor Toolbox

Note: codeDiagram.jpg may be a helpful reference

1. Create a /data/ folder (if one does not exist)
2. Compile appropriate MEX files
Note: parallel verion requires OpenMP
3. Adjust any settings in cancerTensCV.m (see init_options.m for more info)
4. Adjust saved data file name(s) as appropriate
5. Run cancerTensCV.m
6. Output will be in /data/ folder

Yang model decomposition:

From: "Bayesian Conditional Tensor Factorizations for High-Dimensional Classification" by Yun Yang & David Dunson
Required data files: asdSparse.csv or cancerSparse.csv

1. Create a /data/ folder (if one does not exist)
2. Some settings can be adjusted in the yang.m function
3. Adjust saved data file name(s) as appropriate
4. Run cancerYangCV.m
5. Output will be in /data/ folder

Old Classification:

Required data files: asdHBTucker*.mat or cancerHBTucker*.mat (output of Tensor decomposition)
Required packages: matplotlib, numpy, scipy, sklearn, xgboost (for gbm.py only)

1. Create a /plot/ folder (if one does not exist)
2. Ensure gbm.py, logistic_reg.py, logistic_feature_select.py, ran_forest.py, or svm.py loads the right .mat file
3. Adjust any settings (number of features, regression factors, depth, and/or # of estimators)
4. Run gbm.py, logistic_reg.py, logistic_feature_select.py, ran_forest.py, or svm.py
5. Output will be in command line

ASD Classification Method:

Required data files: asdHBTucker*.mat or cancerHBTucker*.mat (output of Tensor decomposition)
Required packages: matplotlib, numpy, scipy, sklearn

1. Create a /plot/ folder (if one does not exist)
2. Compile data using proper *gatherCVData*.m script
3. Ensure logistic_reg2.py or ran_forest2.py loads the right .mat file
4. Adjust any settings (number of features, regression factors, depth, and/or # of estimators)
5. Run logistic_reg2.py or ran_forest2.py
6. Output will be in command line

Cancer Classification Method:

Required data files: cancerHBTucker*.mat or cancerHBTucker*.mat (output of Tensor decomposition)
Required packages: matplotlib, numpy, scipy, sklearn

1. Create a /plot/ folder (if one does not exist)
2. Compile data using proper *gatherCVData*.m script
3. Ensure logistic_reg*.py (* other than above) loads the right .mat file
4. Adjust any settings (number of features, regression factors, depth, and/or # of estimators)
5. Run logistic_reg*.py
6. Output will be in command line

Data Pre-Processing:

Required data files: network.txt (KEG pathways) or Ensembl2Reactome_All_Levels.txt (Reactome pathways), and appropriate ASD or cancer data

1. Run appropriate geneRef*.R script
2. Run appropriate genePath*.R script
3. Run asdProc.R (if using ASD data)
4. Run asdPath2.R or formatCancerData.R

Files

  • acc_cv.py- CV classification accuracy (for cancer dataset)
  • acc_cv2.py- CV classification accuracy - uses csv from no-decomposition model
  • acc_cv_sep.py- CV classification accuracy - useful for if each CV fold is in a different file
  • acc_cv_w_gen.py- CV classification accuracy with generated classes
  • acc_yang.py- CV classification accuracy for Yang generated dataset
  • asdGeneSelect.m- method for pre-selecting specific genes based on logistic regression
  • asdGeneSelectCV.m- method for pre-selecting specific genes based on logistic regression, while cross-validating
  • asdGeneSelectCV2.m- method for pre-selecting specific genes based on logistic regression, while cross-validating, using genetic variants only
  • asdHBTucker3.m- hierarchical Bayesian Tucker (HBT) decomposition function
  • asdHBTuckerNew.m- HBT decomposition function (for new/test/validation data)
  • AsdHBTucker.prj- Simulink Project file
  • asdPath2.R- adds pathway information to ASD dataset
  • asdProc.R- processes ASD datasets into single dataset (with patients, genes, & pathways)
  • asdTens.m- main run file
  • asdTensCV.m- main run file for ASD dataset, separates decomposition into CV folds
  • asdTensCVTest.m- main run file, computes groups for CV test folds
  • cancerCP.m- computes CP decomposition for cancer data (using Tensor Toolbox ALS method)
  • cancerCVLL.m- computes log-likelihood (LL) for existing cancer HBT decomposition data
  • cancerGatherCVData.m- gathers proper CV data from HBT decomposition into a single .mat file
  • cancerGatherCVDataAugm_LDANoDecomp.m- gathers proper augmented CV data from LDA & no decomposition into a single .mat file
  • cancerGatherCVDataAugm_LDANoDecompRmTopGVs.m- gathers proper augmented CV data from LDA & no decomposition into a single .mat file (and removes top genetic variants in each topic)
  • cancerGatherCVDataGen.m- gathers proper augmented CV data from HBT decomposition into a single .mat file (for generated data)
  • cancerGatherCVDataLDA.m- gathers proper CV data from LDA decomposition into a single .mat file
  • cancerGatherCVDataLDAGen.m- gathers proper CV data from LDA decomposition into a single .mat file (for generated data)
  • cancerGatherCVDataLDA_rmDomTop.m- gathers proper CV data from LDA decomposition into a single .mat file (and removes top genetic variants in each topic)
  • cancerGatherCVData_noDecomp.m- gathers proper CV data for no decompositon into a single .mat file
  • cancerGatherGenData_noDecomp.m- gathers proper CV data for no decompositon into a single .mat file (for generated data)
  • cancerGenData.m- generates fictitious patients using the HBT generative process (from original ASD/cancer data)
  • cancerGenDataLoad.m- generates fictitious patients using the HBT generative process (from an existing decomposition)
  • cancerGenDataLoadSumPwy.m- generates fictitious patients using the HBT generative process (from an existing decomposition)
  • cancerGenNumInd.m- creates CV indices for generated cancer dataset
  • cancerInd.m- creates CV indices for cancer dataset
  • cancerLLGraph.R- generates LL graph for cancer models
  • cancerTensCV.m- main run file for cancer dataset, separates decomposition into CV folds
  • cancerTensCVGen.m- HBT decomposition function (for generated cancer data)
  • cancerTensGen.m- HBT decomposition function (for generated cancer data, without CV)
  • cancerTensHLDA.m- main run file for cancer dataset for hLDA decompositons, separates decomposition into CV folds
  • cancerYangCV.m- computes Yang model for cancer dataset
  • computeLL.m- computes LL for existing cancer HBT decomposition data (from single .mat file)
  • counts.m- computes counts of samples in tree and for the decomposition tensor (phi & psi)
  • createMRMRcsv.m- creates data csv for use in mRMR
  • crp.m- draws new restaurant from Chinese Restaurant Process (CRP)
  • drawCoreCon.m- draws the core tensor for the conditional Dirichlet distribution
  • drawCoreUni.m- draws the core tensor for the uniform Dirichlet distribution
  • drawZ.m- draws topics for a specific sample
  • drawZc.c- C version of drawZ function
  • drawZsCollapsed.c- C version of drawZs function, collapsed sampling
  • drawZsCollapsedPar.c- C version of drawZs function, collapsed sampling, with OpenMP parallelization
  • drawZsc.c- C version of drawZs function
  • drawZscPar.c- C version of drawZs function, with OpenMP parallelization
  • drawZscSparse.c- draws topics for a specific sample (using sparse tensor representation)
  • drawZscSparsePar.c- draws topics for a specific sample, with OpenMP parallelization (using sparse tensor representation)
  • drchrnd.m- generates probabilities from the Dirichlet distribution
  • elems.m- returns all values between two vectors
  • entropy.m- calculates entropy of probability vector
  • formatCancerData.R- format cancer data into sparse representation of patients, genes, & pathways
  • gatherCVData.m- collects all data into one file for CV classification
  • gbm.py- uses a gradient boosting model to learn & predict ASD
  • gbm_mi.py- uses a gradient boosting model to learn & predict ASD, with MI feature selection
  • genLDA.py- generates fictitious patients using LDA
  • genePath.R- format pathway information from KEG
  • genePath2.R- format pathway information from Reactome
  • generatePatients.m- generates new patients from trained HBT model
  • generatedDataClass.py- creates classes for generated data
  • generatedDataLDA.py- computes LDA decomposition (on generated data)
  • geneRef.R- converts genes to refGen label (for KEG)
  • geneRef2.R- converts genes to ensGen label (for Reactome)
  • initPAM.m- initializes variales for initializePAM.m
  • init_options.m- option initialization
  • initializePAM.m- initializes hierarchical DAG from the PAM
  • initializeTree.m- initializes hierarchical tree from the CRP
  • lda.py- class for LDA model (modified from Gensim)
  • ldaCoherence.py- computes LDA model coherence
  • ldaCust.py- trains LDA models, using CV (uses modified LDA code)
  • ldaGV.py- trains LDA models, using CV
  • ldaGVcancer.py- main file for doing LDA decomposition on cancer dataset
  • ldaMI.py- computes mutual information for LDA topics
  • ldaParseLog.py- parses LDA log files to get LL and perplexity values, plot LL over iterations
  • logLikelihood.m- computes the LL of a HBT model
  • logistic_feature_select.py- predict using logistic regression with MI feature selection
  • logistic_feature_select_mrmr.py-
  • logistic_reg.py- predict using logistic regression with regularization
  • logistic_reg2.py- predict using logistic regression with regularization, uses CV tensors
  • logistic_reg3.py- predict using logistic regression with regularization, multi-class accuracy (for cancer dataset)
  • logistic_reg4.py- predict using logistic regression with regularization, multi-class accuracy (for raw CSV data)
  • logistic_reg_yang.py- predict using logistic regression with regularization, multi-class accuracy (for Yang model)
  • logml.m- modified function, used in Yang model
  • logml2.m- function, used in Yang model
  • mRMR.r- selects features using mRMR method
  • multi.m- draws a single sample from the multinomial distribution
  • newPAM- draws PAM model for test documents
  • newTreePaths.m- draws tree for test documents
  • newTreePathsInit.m- draws tree for test documents
  • nn.py- uses a neural network model to learn & predict ASD
  • opt.m- separate file that computes tests for our optimization problem
  • psiCompMH.m- compares psi from original HBT decomposition and that on generated data, using Metropolis-Hastings (MH) algorithm
  • psiMH.m- compare 2 psi matrices, using MH
  • ran_forest.py- predict using random forest
  • ran_forest2.py- predict using random forest (for CV datasets)
  • ran_forest_mi.py- predict using random forest, with MI feature selection
  • ran_forest_yang.py- predict using random forest (for Yang model)
  • redrawPAM.c- draws the DAG from the PAM
  • redrawTree.c- draws the tree from the CRP
  • rgamma.c- samples small-shape gamma RVs via accept-reject
  • roc_cv.py- computes and plots ROC for each CV
  • roc_cv2.py- computes and plots ROC for each CV, for CV datasets
  • roc_cv_nn.py- computes and plots ROC for each CV, for nn.py
  • sortTopics.m- re-orders topics
  • spliceFormatData.m- reformating gene splice data to include both genes for those unknown
  • spliceFormatDataRand.m- reformating gene splice data to include random gene, from options, for those unknown
  • spliceTensCV.m- HBT decomposition function (for splice data)
  • stirling.m- unction, used in Yang model
  • svm.py- computes SVM tests with regularization parameter
  • svm2.py- computes SVM tests with regularization parameter (cleaner version)
  • svm_yang.py- computes SVM tests with regularization parameter (for Yang model)
  • tenDec.py- computes CP decomposition for cancer data (using tensorly package)
  • tensIndex.m- transform cell with lists of indices for each dimension into a matrix of every combination of indices
  • tensIndex2.m- transforms indices from multivariate matrix to single linear index
  • topicCount.m- computes number of topics in each mode for HBT decomposition
  • tuckerCompMH.m- compares two Tucker decompositions using MH
  • tuckerCompPCA.m- compares two Tucker decompositions, using PCA to equate the number of topics
  • tuckerMH.m- compares phi from two Tucker decompositions, using MH
  • yang.m- modified function, computes Yang model decomposition
  • yangAcc.m- computes accuracy of trained model on Yang generated data
  • yangTest.m- computes HBT decomposition on Yang generated data

asdhbtucker's People

Contributors

ars2240 avatar

Watchers

James Cloos avatar Luo Lab at Northwestern University avatar  avatar

Forkers

bharambe77

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.