Giter Club home page Giter Club logo

delucs's Introduction

DeLUCS

This repository contains all the source files required to run DeLUCS (https://doi.org/10.1101/2021.05.13.444008) a deep learning clustering tool for DNA sequences, as well as a detailed guide for running the code.

drawing

Computational Pipeline:

1. Build the dataset:

	python build_dp.py --data_path=<PATH_sequence_folder>	
  • Input: Folders with the sequences in FASTA format
  • Output : file in the form (label,sequence,accession)

2. Compute the mimic sequences.

  python get_pairs.py --data_path=<PATH_pickle_dataset> --k=6 --modify='mutation' --output=<PATH_output_file> --n_mimics=<n mimics per sequence>
  • Input: file in the form (label,sequence,accession)
  • Output : file in the form of (pairs, x_test, y_test)

3. Train the model.

  • For training DeLUCS to cluster your own data (No ground truth available):

     python TrainDeLUCS.py --n_clusters=<number of clusters> --data_dir=<PATH_of_computed_mimics> --out_dir=<OUTPURDIR> 
    
    • Input: Pickle file with the mimics (Always ending with /testing_data.p) .
    • Output : Pickle file with the cluster assignments for each sequence.
  • For training DeLUCS and testing its performance with your own data (labels must be available)

     python EvaluateDeLUCS.py --data_dir=<PATH_of_computed_mimics> --out_dir=<OUTPURDIR>
    
    • Input: Pickle file with the mimics in the form of (pairs, x_test, y_test).
    • Output : Confusion Matrix.
  • For training a single Neural Network in an unsupervised way:

     python SingleRun.py --n_clusters=<number of clusters> --data_dir=<PATH_of_computed_mimics> --out_dir=<OUTPURDIR>
    
  • For testing the performance a single Neural Network trained in an unsupervised way (labels must be available):

     python EvaluateSingleRun.py --data_dir=<PATH_of_computed_mimics> --out_dir=<OUTPURDIR>
    

To Do:

  • Optimize the testing process
  • Add training parameters to the command line input
  • Re-Define input structure
  • Define Costumizable Architecture

delucs's People

Contributors

millanp95 avatar pmillana avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.