MATLAB implementation of PANDA & LIONESS algorithms.
Full PANDA & LIONESS algorithms were described in the following literature:
- Glass, Kimberly, et al. "Passing messages between biological networks to refine predicted interactions." PLoS ONE 8.5 (2013): e64832.
- Kuijjer, Marieke Lydia, et al. "Estimating sample-specific regulatory networks." arXiv preprint arXiv:1505.06440 (2015).
Author: cychen ([email protected]), marieke, kimbie.
Orignial source code adapted from marieke & kimbie's version.
- Set up PANDA run-time parameters by editing
panda_config.m
. - Run PANDA main program via
panda_run.sh
.
- Run PANDA first to get preprocessed middle files and aggregated PANDA network.
- Set up LIONESS run-time parameters by editing
lioness_config.m
. - Run LIONESS main program via
lioness_run.sh
.
See example input files in test_data/
.
The example files include following data:
- A toy expression profile (1000 genes x 50 samples).
- A list of genome-wide TF-target interactions (motif prior).
- A list of protein-protein interactions (PPIs) between TFs.
- The output PANDA network building on the example data.
We reach a 20-25% reduction in computation time and memory usage in comparison to its early version. The following optimazation has been implemented in this version:
Use MATLAB built-in zscore function instead of calculation from scratch. -> 30% speed-up.
Use bsxfun instead of repmat, use symmetric matrix multiplication, and reuse summed-square vector. -> 25% speed-up.
Move out network normalization from PANDA function to main program to remove unnecessary repeated normalization in each following LIONESS iteration: both PPI network and motif network need to be normalized only once. -> 1-10 sec reduced for each LIONESS iteration depending on the network sizes.
Save W matrix (R+A) so that we don't have to compute it twice. -> ~0.5% overall speed-up.
Check hamming inside the while-loop to reduce the last unnecessary updates on TFcoop and GeneCoReg networks. -> ~2% overall speed-up.
Use MATLAB binary files (mat-file) instead of text files for I/O which boost the performance (>10x faster).
Save the input expression matrix (transposed), normalized motif/PPI networks, and the aggregated PANDA network to binary files (mat-file) for later use in each LIONESS run: expression matrix can only be saved to v7.3 mat file (compressed Matlab HDF5 format) when its size is over 2GB; normalized motif/PPI networks and output PANDA network can be saved as v6 uncompressed mat files for best I/O efficiency.
Move out this part as an independent function so that no need to transpose the input matrix each time in LIONESS iteration. Consequently, we save a copy of expression matrix in the memory and also one matrix transpose operation overhead for each single-sample LIONESS run.