Giter Club home page Giter Club logo

ssw_aligner's People

Contributors

kyu999 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ssw_aligner's Issues

align two arrays

Hi,
I wonder can seq1 and seq2 be arrays of strings?
For example,
seq1=['ATC','CCG','GCT']
seq2=['CTC','CCG','GCT']
Thank you.

low optimal alignment score for almost identical sequences

Not sure why sometimes the alignment function returns very low optimal alignment score for almost identical sequences and hence no alignment is returned (query_begin == query_end in the result).
Example:
In [211]: align_func = ssw_aligner.local_pairwise_align_ssw
In [212]: o = align_func(seq1[0:10], seq1[1:11])
In [213]: o
Out[213]:
{
'optimal_alignment_score': 2,
'suboptimal_alignment_score': 0,
'query_begin': 9,
'query_end': 9,
'target_begin': 8,
'target_end_optimal': 8,
'target_end_suboptimal': 0,
'cigar': '1M',
'query_sequence': 'bbbbbbbbBT',
'target_sequence': 'bbbbbbbBTT'
}

Reported CIGAR string is for a suboptimal alignment in some cases.

Hi,

The example code here uses an aligner with the following scoring scheme:
Match:2, Mismatch:-1, all_gaps_penalties=1

import ssw_aligner

query='CAGACAATCAGCATGTTTCCGGCAGCGCCGGTAG'
target='TTCCACCATTTGTCCGGACCGGGC'

def get_striped_ed_aligner(query_seq,match_score=2):
    return ssw_aligner.StripedSmithWaterman(
      query_seq,
      gap_open_penalty=1,
      gap_extend_penalty=1,
      match_score=match_score,
      mismatch_score=-1,
      suppress_sequences=True,
      score_only=False,
      score_size=2, ## score is < 255 this should be 0; >255 then 1; 2 don't know
      zero_index=True,
      mask_length=0,   ## Turn off suboptimal
      mask_auto=False, ## Turn off suboptimal
    )
algn=get_striped_ed_aligner(query,match_score=2)(target)

cigcnt=defaultdict(lambda:0)
def add_tl(d,t,l): d[t]+=l; return
[add_tl(cigcnt,t,l) for l, t in algn._tuples_from_cigar()]
print(cigcnt)

print(algn.__repr__())

This results with CIGAR counts and the alignment stats:

defaultdict(<function <lambda> at 0x7f35c0ec2048>, {'M': 19, 'I': 5, 'D': 1})
{
    'optimal_alignment_score': 28,
    'suboptimal_alignment_score': 0,
    'query_begin': 7,
    'query_end': 30,
    'target_begin': 1,
    'target_end_optimal': 21,
    'target_end_suboptimal': -1,
    'cigar': '7M1I2M1D5M1I1M3I4M',
    'query_sequence': '',
    'target_sequence': ''
}

The issue here is that the CIGAR string returned in the alignment above is for a suboptimal alignment of score 26, see the computations below:

#CAGACAA |TCAGCATGTT TCCGG CAGCGCCGG| TAG
#      T |TCCACAT TTGTCCGG  A   CCGG| GC
# 'cigar':  '7M    1I2M1D 5M1I1M 3I  4M',
#                           M=5+2+2+5+1+4=19  Mismatch=2  Match=17 I=5 D=1
#                           score=2*17-2-5-1=26
print(2*17-2-5-1)
26

The optimal alignment with the reported score has a different CIGAR string, see below.

#CAGACAA |TC  AG  CATGTT  TCCGGCAGCGCCGG| TAG
#      T |TC CA   CAT TTG TCCGG A   CCGG| GC
# 'cigar':'2M1D1M1I3M1I2M1D  5M1I1M3I  4M',
#                           M=18 Mismatch=0 I=6 D=2  
#                           score=2*18-6-2=28
print(2*18-6-2)
28

Is it possible to fix it such that the reported CIGAR string is always for an optimal alignment?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.