Giter Club home page Giter Club logo

progressivealigner.jl's Introduction

ProgressiveAligner

Join the chat at https://gitter.im/latticetower/ProgressiveAligner.jl

Build Status

This package contains progressive alignment tool for protein sequences written in Julia language (http://julialang.org/).

Builds phylogenetic tree with neighbour joining, UPGMA or WPGMA algorithm, then aligns protein sequences by their profiles.

Usage examples currently can be found in test folder.

Typical usage pipeline

  1. call methods from DataReader submodule and read data from files.

One way to get protein sequences - to read them from file:

sequences = readSequences(dirname(@__FILE__()) * "/../data/input_test_sequences.faa")

After this call, sequences is an Array of FastaRecord objects. FastaRecord object can also be created directly, from description and protein sequence string:

fasta_record  = FastaRecord("test string1", "ACGT")

Alignment algorithm uses alignment score matrix to score sequences; this matrix can be loaded from alignment matrix file, which can be loaded from NCBI ftp.

matrix = readMatrix(dirname(@__FILE__()) * "/../data/blosum62.txt")
  1. Convert and prepare data.

Alignment algorithm don't use FastaRecord objects directly. It converts protein strings data to Profile objects, then merges these objects with different tree building methods to one Profile, which represents multiple alignment and can be converted to Array of FastaRecord objects with gaps.

First step is to incorporate FastaRecords or strings to Array of Profiles:

strToProfiles(strings :: Vector{FastaRecord}) = [Profile{Float64}(record.sequence, record.description) :: Profile{Float64} for record in strings]

profiles = strToProfiles(sequences)

For collection of strings, profile array creation can be done in similar way:

start_vertices2 = [Profile{Float64}(str)::Profile{Float64} for str in
    [
      "CAP",
      "CAPT",
      "APT",
      "PPT"
      ]]

Second step is to set current scoring matrix. This can be done via call

ProfileAligner.setScoringMatrix(score_matrix)
  1. Define score and merge functions:
function scoreFunc(p1 :: Profile{Float64}, p2 :: Profile{Float64})
  ProfileAligner.scoreprofiles(p1, p2)
end

function mergeFunc(p1 :: Profile{Float64}, p2 :: Profile{Float64})
  ProfileAligner.align(p1, p2)
end

Score function returns best alignment score for given pair of profiles. Merge function returns resulting profile object, which can be build by best-scored alignment. The main difference between these two methods - first one can be be computed faster and can consume less memory.

Currently there is default implementation for both of these methods in ProfileAligner submodule (corresponding methods are shown in previous code example), which uses score matrix, set by setScoringMatrix call, to select best-scored profile alignment.

  1. Select one of tree-based methods to perform multiple alignment.

Currently 3 clustering algorithms are implemented - NeighbourJoining, UPGMA, WPGMA. They got similar signature and can be called like that:

njResult = NeighbourJoining(profiles, scoreFunc, mergeFunc)
wpgmaResult = WPGMA(profiles, scoreFunc, mergeFunc)
upgmaResult = UPGMA(profiles, scoreFunc, mergeFunc)

The result of each of these calls is a single Profile object, which represents multiple alignment.

  1. Convert alignment result to readable way. Save results to file.

ProfileAligner submodule provides utility method getstrings, which converts Profile representation back to array of FastaRecord objects (probably with gaps).

getstrings(result)

There is also utility submodule DataWriter, which can be used to save resulting multiple sequence alignment to .fasta-like file.

writeSequences(output_file, getstrings(result))

First parameter should contain file name to save these records.

There is no differences in file format from typical .fasta, except one - in resulting sequence alignment, each string can contain gaps, represented by '-' symbol. Descriptions are kept to make it possible to find where aligned string came from (and we know that clustering algorithms mix input nodes and can change their order).

progressivealigner.jl's People

Contributors

gitter-badger avatar latticetower avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.