Giter Club home page Giter Club logo

cogs-fingerprints's Introduction

Fingerprints inside COGs strings

This is a mini project code files for "Topics in Bioinformatics 2" course in BGU.

Article: Parikh Mapping-based algorithm for finding gene clusters
Link: http://www.sciencedirect.com/science/article/pii/S1570866703000352

This is an implementation in python2.7.

Before use:

Before use, please install redis.

Install modules via pip, if they are not already installed:

pip install redis
pip install ast
pip install datetime

Preprocess:

Before running the program you should run this:

cd <project-folder>
python src/preprocess.py -taxa <path-to-taxa-data-file>
python src/preprocess.py -sigma <path-to-sigma-data-file>
python src/preprocess.py -strings <path-to-strings-data-file>
python src/preprocess.py -cogs <path-to-cogs-info-file>

Available options for preprocess.py:

  • -taxa :
    Builds the taxa DB accordingly
  • -sigma :
    Builds the sigma DB accordingly
  • -strings :
    Builds the strings DB and strains DB accordingly
  • -cogs :
    Builds the COGs function DB and the COGs list DB accordingly

Main algorithm:

Now you can run the program (No need to repeat the previous steps on your machine anymore):

cd <project-folder>
python src/run.py <results-directory> <option> <arg>

results-directory is the directory in which the program will save the results file.

  • Can be an absolute or a relative path.
  • Can be an existing folder or a new directory that the program will create.

Available options for run.py:

  • -f :
    Runs algorithm for a specific family, for example: -f bacgroup_Acidobacteria.
  • -t :
    Runs algorithm for a specific family type, for example: -t bacgroup.

Postprocess:

For postprocessing:

cd <project-folder>
python src/postprocess.py <results-folder> <family> <options....>
  1. results-directory is the directory from which the program will get the results file created when running the run.py program.
    It also will be the directory the postprocess program will save the postprocess results to.
    Can be an absolute or a relative path.
  2. family is the specific family name we want to process the results of, for example: bacgroup_Acidobacteria.

Available options for postprocess.py:

  • -threshold :
    Runs postprocessing for thresholds of [0.05, 0.1, 0.2, 0.3, 0.5, 0.8], where x in thresholds array is the % of strings of all the strings for this family with the same fingerprint.
  • -cogs :
    Runs postprocessing for a specific COGs function list, for example: -cogs ['S','V','V'].
    Finds all fingerprints with those functions that are above threshold (as in the previous option) with the addition of threshold 0 for all the fingerprints w.
  • -find :
    Runs postprocessing for a specific COGs list, for example: -find ['0841','0845','3422'].
  • -findWithLen :
    Runs postprocessing for a specific COGs list for length constrain, for example: -findWithLen ['0841','0845','3422'] 4.

cogs-fingerprints's People

Contributors

alexandraseri avatar

cogs-fingerprints's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.