Latent RNA-seq Analysis
This project explores some methods for clustering of RNA sequence reads without a reference transcriptome.
The goal of this clustering is to reduce the computational complexity of building the de Bruijn Graph. Instead of construction the graph on all sequence reads, the graph would be built separately for each cluster
The source/ directory contains all of the code.
lsh.py
lsh_functions.py
Example of running LSH:
./lsh.py -s 10 -k 21 r1.fastq r2.fastq
This runs with k-mer lengths of 21 and a hash size on 10.
ThreeStageClustering.py
Clustering.py
Cluster.py
Example of running TSC:
./ThreeStageClustering.py r1.fastq r2.fastq