kyu999 / ssw_aligner Goto Github PK
View Code? Open in Web Editor NEWPython implementation of Striped Smith-Waterman Algorithm
Home Page: https://pypi.org/project/ssw_aligner/
Python implementation of Striped Smith-Waterman Algorithm
Home Page: https://pypi.org/project/ssw_aligner/
Hi,
I wonder can seq1 and seq2 be arrays of strings?
For example,
seq1=['ATC','CCG','GCT']
seq2=['CTC','CCG','GCT']
Thank you.
Hi
I try to run the demo code, but can't import the module.
Not sure why sometimes the alignment function returns very low optimal alignment score for almost identical sequences and hence no alignment is returned (query_begin == query_end in the result).
Example:
In [211]: align_func = ssw_aligner.local_pairwise_align_ssw
In [212]: o = align_func(seq1[0:10], seq1[1:11])
In [213]: o
Out[213]:
{
'optimal_alignment_score': 2,
'suboptimal_alignment_score': 0,
'query_begin': 9,
'query_end': 9,
'target_begin': 8,
'target_end_optimal': 8,
'target_end_suboptimal': 0,
'cigar': '1M',
'query_sequence': 'bbbbbbbbBT',
'target_sequence': 'bbbbbbbBTT'
}
Hi,
The example code here uses an aligner with the following scoring scheme:
Match:2, Mismatch:-1, all_gaps_penalties=1
import ssw_aligner
query='CAGACAATCAGCATGTTTCCGGCAGCGCCGGTAG'
target='TTCCACCATTTGTCCGGACCGGGC'
def get_striped_ed_aligner(query_seq,match_score=2):
return ssw_aligner.StripedSmithWaterman(
query_seq,
gap_open_penalty=1,
gap_extend_penalty=1,
match_score=match_score,
mismatch_score=-1,
suppress_sequences=True,
score_only=False,
score_size=2, ## score is < 255 this should be 0; >255 then 1; 2 don't know
zero_index=True,
mask_length=0, ## Turn off suboptimal
mask_auto=False, ## Turn off suboptimal
)
algn=get_striped_ed_aligner(query,match_score=2)(target)
cigcnt=defaultdict(lambda:0)
def add_tl(d,t,l): d[t]+=l; return
[add_tl(cigcnt,t,l) for l, t in algn._tuples_from_cigar()]
print(cigcnt)
print(algn.__repr__())
This results with CIGAR counts and the alignment stats:
defaultdict(<function <lambda> at 0x7f35c0ec2048>, {'M': 19, 'I': 5, 'D': 1})
{
'optimal_alignment_score': 28,
'suboptimal_alignment_score': 0,
'query_begin': 7,
'query_end': 30,
'target_begin': 1,
'target_end_optimal': 21,
'target_end_suboptimal': -1,
'cigar': '7M1I2M1D5M1I1M3I4M',
'query_sequence': '',
'target_sequence': ''
}
The issue here is that the CIGAR string returned in the alignment above is for a suboptimal alignment of score 26, see the computations below:
#CAGACAA |TCAGCATGTT TCCGG CAGCGCCGG| TAG
# T |TCCACAT TTGTCCGG A CCGG| GC
# 'cigar': '7M 1I2M1D 5M1I1M 3I 4M',
# M=5+2+2+5+1+4=19 Mismatch=2 Match=17 I=5 D=1
# score=2*17-2-5-1=26
print(2*17-2-5-1)
26
The optimal alignment with the reported score has a different CIGAR string, see below.
#CAGACAA |TC AG CATGTT TCCGGCAGCGCCGG| TAG
# T |TC CA CAT TTG TCCGG A CCGG| GC
# 'cigar':'2M1D1M1I3M1I2M1D 5M1I1M3I 4M',
# M=18 Mismatch=0 I=6 D=2
# score=2*18-6-2=28
print(2*18-6-2)
28
Is it possible to fix it such that the reported CIGAR string is always for an optimal alignment?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.