Giter Club home page Giter Club logo

abpoa's People

Contributors

cjw85 avatar ekg avatar emollier avatar glennhickey avatar imciner2 avatar yangao07 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abpoa's Issues

Are input QVs meaningful?

Hi!

Quick question: From SPOA I've learned that leveraging per-base quality values to weight nodes in the graph increases the accuracy of the generated consensus, especially using PacBio HiFi as input. Have you experimented with such feature?

Thanks,
Armin

Is there any way to reduce memory consumption?

Hello, I'm experimenting with adding abPOA as an option within cactus (manuscript). Thanks for making a great tool -- it's amazingly fast.

I was wondering if there's a way to reduce memory consumption, however, in order to increase the sequence lengths I can run on. Right now it seems roughly quadratic in the sequence length, which is as expected when reading your manuscript. I'm curious to know if there are any options I can use to reduce this and/or if you've thought about using the banding to reduce the DP table size (as far as I can tell, it's only used to reduce computation)?

ARM support?

Have you considered supporting other architectures, such as ARM, in abpoa? They're becoming increasingly popular due to Apple and Amazon, among others.

One way to try without rewriting everything is with a wrapper layer like https://github.com/simd-everywhere/simde

Error in logic for read ID labeling at source node

Hello,
I'm working on binding abPOA in nim for a project I'm working on and I think I stumbled upon an error in the logic of the function abpoa_add_subgraph_alignment. Specifically related to the labeling of read IDs in the source node. At several points in this function there's a line:
if (last_id == beg_node_id && beg_node_id != ABPOA_SRC_NODE_ID) w = 0, add = 0; else w = 1, add = 1;

I think based on my understanding of the desired result here, that the beg_node_id != ABPOA_SRC_NODE_ID should be beg_node_id == ABPOA_SRC_NODE_ID.

I could be wrong, if it's intended behavior to add weights and read_IDs at the source node then please correct me.
Thank you for the tool by the way it's quite impressive work.

Add best alignment score to abpoa_res_t

Hi @yangao07,

happy to see that abPOA is outperforming https://github.com/rvaser/spoa!

As indicated in #2, we are now trying to replace SPOA with abPOA in https://github.com/ekg/smoothxg/blob/57a9d568aec44986b8293ac1f33c1b128b8a0d46/src/smooth.cpp#L60.

I am hanging at the following:
When aligning a sequence to the abPOA graph, I need to know if the actual sequence or its reverse complement results in a better alignment score. I figured that best_score would deliver this, but it is not part of the abpoa_res_t. Could you add this, please?
Or ideally, abPOA already gives us this best alignment. I scanned the code, but could not find anything indicating that.
So, do you take the actual sequence and its reverse complement and perform alignments for both? Or just for the default sequence? Because we are still puzzled about this peculiar sequence in DRB1-3123. There is the rev_cigar argument, but it is only initialized as 0 and therefore does not have any effect, as far as I can tell.

Thanks for any feedback.

Best,
Simon

CC @ekg

Segmentation fault and mis-alignment issue

np_reads.txt

The attached fasta file (np_reads.txt) contains 1500 Nanopore reads, all of them are in the same orientation.

Command used:
abpoa -A -r1 np_reads.txt > output.fas

  1. abPOA does not finish alignment of that many reads and stops: “Segmentation fault (core dumped)”. The RAM consumption was about 1.4 GB on my laptop, which should not be a problem (I have 32 GB). If you remove 500 reads, it works.

  2. The alignment of the first 1000 reads works, but few of the reads will be largely mis-aligned:
    7ec76e50-b3b6-4b7d-9937-2dd24d0f6f5c
    8d75c175-85cc-4c4b-9227-61c51427a6c5
    548a0a81-544c-4b32-88c4-2dc6651ff203
    1ff17285-2221-4a30-8d94-3322e83de346
    11055030-fd98-4aea-9fba-ce23c1b41a5c
    42b7e0c6-7b4d-461f-b7d7-a394d0f9c1cb

These mis-aligned reads seem to be one of the longest ones. After alignment these reads are much more similar to each other than they are to the other reads.

Cheers,
Marko

Regression in v1.5.0

I tried to update the abPOA version used by Cactus from v1.4.3 to v1.5.0, but it caused the CI tests to fail.

I've pulled out an example that is easy to reproduce:

wget -q http://public.gi.ucsc.edu/~hickey/debug/abpoa_fail_jan12_2024.tar.gz
tar xzf abpoa_fail_jan12_2024.tar.gz
cd abpoa_fail_jan12_2024/
abpoa abpoa_fail_jan12_2024.fa -O 400,1200 -E 30,1 -b 300 -f 0.050000 -t abpoa_fail_jan12_2024.mat -r 1 -m 0 -p > abpoa_fail_jan12_2024.out
[main] CMD:  abpoa -O 400,1200 -E 30,1 -b 300 -f 0.050000 -t abpoa_fail_jan12_2024.mat -r 1 -m 0 -p abpoa_fail_jan12_2024.fa
Segmentation fault (core dumped)

This runs fine with v1.4.3.

abPOA seems unstable with HOX scoring matrix

I noticed abPOA returning some bizarre alignments when running with a scoring matrix. In my Cactus output, it would lead to a lot of cases of identical sequences getting useless gaps (a T inserted, then a T deleted a few bases later, ex).

I tried to pull out an example to reproduce on the command line. Instead of the wrong alignment, it just crashes (hopefully it's the same underlying problem). It runs okay without -t

wget http://public.gi.ucsc.edu/~hickey/debug/abpoa_fail_may31.fa
./abPOA/bin/abpoa abpoa_fail_may31.fa -m 0 -r 1  -t abPOA/HOXD70.mtx -N
[simd_abpoa_align_sequence_to_subgraph] Error in cg_backtrack. (5)

In cactus, switching on the anchoring with disableSeeding=0 fixes many of these cases. But I am worried there is an underlying issue (perhaps numerical?) with scoring matrices that makes it unstable for larger subproblems.

GFA output

Have you considered adding GFA output to abPOA?

I am likely to make a patch to do this, but I wouldn't mind if you beat me to it. 🙂

Option to switch off logging messages

Would it be possible to add an option to disable the logging messages to stderr? I think they all come from err_func_format_printf() and start to dominate the logs from my client code that calls abpoa repeatedly. Thanks so much!

Is FASTQ output format supported?

Hi,
We love your tool. It is really good!
I've tried to use it on fastq files with quality score weight (-Q), since this feature is available since 1.4.1. But the output I get is still in fasta format. Is it possible to get output in fastq format? Is this a feature for future development?
Cheers!

License

Hi!

Great work. Any chance that you would consider relicensing to a more permissive license or offer a one time relicensing?

Thank you!

segfault when generating GFA

This sequence set has pretty deep coverage, so perhaps this is causing the problem.

smoothxg_block_1012.fa.txt

In any case, I can't get it to run with GFA output:

-> % abpoa -s -r 3 smoothxg_block_1012.fa >smoothxg_block_1012.abpoa-s.gfa    
corrupted size vs. prev_size
[1]    83982 abort (core dumped)  abpoa -s -r 3 smoothxg_block_1012.fa > smoothxg_block_1012.abpoa-s.gfa

Here's a backtrace:

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7da1859 in __GI_abort () at abort.c:79
#2  0x00007ffff7e0c3ee in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7f36285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff7e1447c in malloc_printerr (str=str@entry=0x7ffff7f3443a "corrupted size vs. prev_size") at malloc.c:5347
#4  0x00007ffff7e14aeb in unlink_chunk (p=p@entry=0x555555971800, av=0x7ffff7f67b80 <main_arena>) at malloc.c:1454
#5  0x00007ffff7e1600b in _int_free (av=0x7ffff7f67b80 <main_arena>, p=0x55555596b800, have_lock=<optimized out>) at malloc.c:4342
#6  0x00005555555619a2 in abpoa_generate_gfa ()
#7  0x000055555555a67b in abpoa_main ()
#8  0x000055555555aeb2 in main ()

This indicates that the error is probably in generate_gfa.

Meanwhile, MSA generation appears to work:

abpoa -s -r 1 smoothxg_block_1012.fa >smoothxg_block_1012.abpoa-s.msa.fa

MSA to GFA

Hello,
Thank you for this wonderful, cross-platform tool!
Is there a way to use this tool to go from an existing MSA to GFA? This would be my "starter" graph;
Then I would use your upcoming "incremental" option to add either new MSA's or new sequences to it...

make install using prefix does not set rpath

Hello

using release taged archive for version 1.2.5 (https://github.com/yangao07/abPOA/releases/download/v1.2.5/abPOA-v1.2.5.tar.gz)

when abPOA is built using a prefix location make install does not set rpath to $PREFIX/lib64

see

rpmmaker:~ > wget https://github.com/yangao07/abPOA/releases/download/v1.2.5/abPOA-v1.2.5.tar.gz
rpmmaker:~ > tar xf abPOA-v1.2.5.tar.gz
rpmmaker:~ > mkdir build && cd build
rpmmaker:~/build > cmake -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/abPOA ../abPOA-v1.2.5 && make -j && make install

then

rpm_maker:build > ldd /tmp/abPOA/bin/abpoa 
	linux-vdso.so.1 (0x00007ffc44cd8000)
	libabpoa.so => not found
	libz.so.1 => /lib64/libz.so.1 (0x00007f6e99c8d000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6e99a6d000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f6e996eb000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f6e99326000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f6e99ea4000)
rpm_maker:abPOA-v1.2.5/build > patchelf --print-rpath /tmp/abPOA/bin/abpoa 

maybee you should consider adding SET(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}") to CMakeLists.txt

Error in cg_backtrack

When testing, I'm occasionally running into Error in cg_backtrack. The specific error message seems unstable, I'm getting anything from (0) to (5). When I take the sequences that generated the error and put them back into abpoa on the command line, I can't reproduce the error.

What do these errors indicate?

Pre-built binary programme not compatiable between difference versions of CPUs?

Hi Yan,

We are trying to run abPOA (with pre-built binary) on cloud . But we found this binary didn't work between some versions of Intel CPUs, or between AMD and Intel CPUs.
Is there a way to by-pass this issue? We are considering compling the binary every single time before we launch jobs in cloud, but that's not very elegant :(

Cheers,
Jiaan

make vs cmake

Hi,
is there any difference in building abpoa using make or cmake (e.g., missing optimization..)?

Best,
Luca

Consensus coverage in diploid mode

Hi all,
first of all: great library!

I was playing around with diploid mode and I ran into strange segmentation faults (when freeing cons_cov). From what I could see, in diploid mode, cons_cov is not used at all:

abPOA/src/abpoa_graph.c

Lines 817 to 818 in 637c79f

if (abpt->is_diploid) {
abpoa_diploid_heaviest_column(abg, ABPOA_SRC_NODE_ID, ABPOA_SINK_NODE_ID, n_seq, abpt->min_freq, out_fp, cons_seq, cons_l, cons_n);

So, I'm assuming that it's not possible to get consensus coverage in diploid mode. Is this true?

Thanks,
Luca

-S option for pyabpoa

Hi,

My lab's Mandalorion and C3POa tools use pyabpoa and it works great. Just every once in a while, we encounter an overlong sequence and pyabpoa stalls for a very long time.

I've noticed that abpoa has the -S option to do seeded alignment, potentially using a lot less memory in the process.
Is that option available for pyabpoa as well? I can't seem to find that info.

Thank you,
Chris

abPOA fails to capture difference in prefix

Hi,

Thank you for developing abPOA!

I am trying to use abPOA to build consensus and cluster short TR sequences (~40-70bp in length). I've noticed that if the difference in sequences is in prefix, then abPOA identifies only one cluster instead of two.

I generated three different dataset of 20 reads using this function:

def generate_subreads(num, where = "start"):
    seqs = []
    repeat = "CTAT"
    insertion="TCTA"
    if where == "start":
        for i in range(num//2):
            seqs.append(SeqRecord(Seq(insertion + repeat*15), id = str(i), name= str(i), description=""))
    elif where == "middle":
        for i in range(num//2):
            seqs.append(SeqRecord(Seq(repeat*8 + insertion + repeat*7), id = str(i), name= str(i), description=""))
    else:
        for i in range(num//2):
            seqs.append(SeqRecord(Seq(repeat*15 + insertion), id = str(i), name= str(i), description=""))

    for i in range(num//2):
        seqs.append(SeqRecord(Seq(repeat*15), id = str(num//2 + i), name= str(num//2 + i), description=""))

    return seqs

and run:
aligner.msa(seqs, out_cons=True, out_msa=True, max_n_cons=2, min_freq=0.3)

and here is the output of the aligner.msa on these datasets:
synthetic_output.txt

Is there a problem with abpoa algorithm on such corner case?

Thank you,
Tatiana

abPOA puts big gap in middle of sequence

I'm aligning a short sequence to a very long one, and they look like

ATATA
ATATATATATA<...>A

When I align them with abpoa, it puts the gap in the middle

ATAT------------<...>A
ATATATATATA<...>A

But I'd much rather have an alignment like

ATATA-----------<...>
ATATATATATA<....>

I can understand that, strictly speaking, with the global alignment scoring model, there isn't much difference between these two scenarios, in practice I think the second one is always preferred. Do you think it would be possible to add some way to left shift or favour gaps at the ends in global alignment mode?

Thanks, as always.

to reproduce

wget http://public.gi.ucsc.edu/~hickey/debug/abpoa_fail_mar23.fa
abpoa abpoa_fail_mar23.fa -m 0 -r 1

Feature request: Allow specification of full scoring matrix

Only a match score and mismatch penalty are available now. Would it be possible to allow specification of a full matrix? We would like to be able to, for example, use the default lastz scoring matrix referred to here: http://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.03.html#options_scoring

As far as I can tell, this is just requires additional interface given that a full matrix seems to be used internally? Personally, I'm more interested in doing this via the C API than the command line, but I'm sure other users would like the option there too. Thanks!

pip install fails on multiple machines

Hi,

I've been trying to install pyabpoa and on multiple machine the install fails with

src/abpoa_align.c:6:10: fatal error: 'abpoa_seq.h' file not found #include "abpoa_seq.h" ^~~~~~~~~~~~~ 2 warnings and 1 error generated. error: command '/usr/bin/clang' failed with exit status 1

Any idea what's going on there?

Thank you,
Chris

PyPI package does not install

Under Python 3.8, Linux x86 the source package is fetched (there's only a Python 3.7 wheel) but fails to install.

Collecting pyabpoa
  Using cached pyabpoa-1.4.0.tar.gz (738 kB)
  Preparing metadata (setup.py) ... done
ERROR: No .egg-info directory found in /tmp/pip-pip-egg-info-_bey3g0a

potential issue when aligning more than 1024 sequences (v1.2.4)

Talking with @glennhickey today, we discovered that we are both struggling with a segfault that appears to occur when calling abpoa_free_seq. I specifically tracked it to here: https://github.com/yangao07/abPOA/blob/master/src/abpoa_seq.c#L110. It appears that this only occurs when aligning >1024 (CHUNK_READ_N) sequences. I am interested in the fact that the initial default allocation is for 1024 sequences

abs->n_seq = 0; abs->m_seq = CHUNK_READ_N;
.

However, neither of us have been able to reproduce this using the same input and calling the command line abpoa tool. This will make it very hard for you to track down the cause.

Adjusting things to attempt to narrow down the problem, I saw errors like this that suggested that the address of one of the values to be freed had been overwritten by a number. This number is outside of the normal 48-bits used for memory addresses:

[smoothxg::smooth_and_lace] applying global abPOA to 53842 blocks:  2.15% @ 4.18e+00/s elapsed: 00:00:04:36 remain: 00:03:29:59AddressSanitizer:DEADLYSIGNAL
=================================================================
==4901==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f14924e8532 bp 0xd80000200003f81 sp 0x7f14744f39a0 T19)
==4901==The signal is caused by a READ memory access.
==4901==Hint: this fault was caused by a dereference of a high value address (see register values below).  Dissassemble the provided pc to learn which register was used.
    #0 0x7f14924e8532 in __asan::asan_free(void*, __sanitizer::BufferedStackTrace*, __asan::AllocType) (/gnu/store/qj38f3vi4q1d7z30hkpaxyajv49rwamb-gcc-10.2.0-lib/lib/libasan.so.6+0x2b532)
    #1 0x7f1492569b43 in free (/gnu/store/qj38f3vi4q1d7z30hkpaxyajv49rwamb-gcc-10.2.0-lib/lib/libasan.so.6+0xacb43)
    #2 0xa7a3c8 in abpoa_free_seq (/export2/erikg/smoothxg/bin/smoothxg+0xa7a3c8)
    #3 0x7b6948 in abpoa_free /export2/erikg/smoothxg/deps/abPOA/src/abpoa_graph.c:142
    #4 0x7dac90 in smoothxg::smooth_abpoa(xg::XG const&, smoothxg::block_t const&, unsigned long, int, int, int, int, int, int, bool, std::unique_ptr<ska::flat_hash_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<maf_partial_row_t, std::allocator<maf_partial_row_t> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<maf_partial_row_t, std::allocator<maf_partial_row_t> > > > >, std::default_delete<ska::flat_hash_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<maf_partial_row_t, std::allocator<maf_partial_row_t> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<maf_partial_row_t, std::allocator<maf_partial_row_t> > > > > > >&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) /export2/erikg/smoothxg/src/smooth.cpp:383
    #5 0x7e9ebd in smoothxg::smooth_and_lace(xg::XG const&, smoothxg::blockset_t*&, int, int, int, int, int, int, bool, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool, bool, double, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool, unsigned long) [clone ._omp_fn.0] /export2/erikg/smoothxg/src/smooth.cpp:1528
    #6 0x7f1491dd2c65 in gomp_thread_start (/gnu/store/qj38f3vi4q1d7z30hkpaxyajv49rwamb-gcc-10.2.0-lib/lib/libgomp.so.1+0x19c65)
    #7 0x7f1491d85f63 in start_thread (/gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libpthread.so.0+0x7f63)
    #8 0x7f1491cb79ae in clone (/gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libc.so.6+0xf69ae)

AddressSanitizer can not provide additional info.

We're both trying to find an input to abpoa that reproduces this error.

Segmentation fault error

Hi Yan,
I want to use abPOA with my ONT data to make multisequence alignment, but it comes out with a "Segmentation fault" error, with empty output.
Here is my cmd
abpoa -r 1 -S Test.fasta > Test.MSA.fasta
and my test data file
Test.zip

I wonder if abPOA is not fit for data with too many long read sequence when making MSA ?

Thank you.
KT

progressive mode does not work with seeding

There seems to be a conflict between -p and -S, where abpoa exits 1 and prints no output if the two options are used together. Could you please clarify if this is a bug or intended behaviour? If it's the latter, I think it would be good to add an informative error message.

Example. t.fa

1>
AGGTTCTATGGCTATACAAATTCAGCCGCCCATGTATGAAGAGAGTTTCTTAGTTATGCAGTAGATAGCTCAATCATCAGCAGTTTTATTGCAGCACACAACATCTATCAAAGGGTATCCTCTTCTTGCCTAGAGATTAGCACTAGTGGCATGATTATGACTTGCTATGCATACTTTCGTGGGTAATCATTTTTAAATTCCTGATTTCCTTTACATTCAG
>2
AGGTTCTATATTAGTCATGTTATACAAAATCATTTGCCCATGTATTATCAGAATCTCTTAGTTATGCAGTAGATGTAGATAGCTTGATCATTGGCAGTTTTGTTGCAGTATACAACAGATGATTACATCATAAATGGTCTCTGAGGTAAGTGTATCTGGTAATCTTGTATCAAAGGGTATCCTCTTTTTGCCTAGACATTAGCGCCAGTGGCATCATCATGATGCTATGCTACTTTCATTGCAGAGAGCCGGGGTAATCGTTTCTGAATTCCTGAGTTCCTTTGCATTCAG

works fine with all combinations of -p or -S by themselves:

abpoa t.fa -m 0 -r 1
[main] CMD:  abpoa -m 0 -r 1 t.fa
>Multiple_sequence_alignment
AGGTTCTAT----G----GCTATACAAATTCAGCCGCCCATGTATGAAGAGAGTTTCTTAGTTATGCA------GTAGATAGCTCAATCATCAGCAGTTTTATTGCAGCACACAACA------------------------------------------------TCTATCAAAGGGTATCCTCTTCTTGCCTAGAGATTAGCACTAGTGGCATGATTATGACTTGCTATGCATACTTTCGT------------GGGTAATCATTTTTAAATTCCTGATTTCCTTTACATTCAG
AGGTTCTATATTAGTCATGTTATACAAAATCATTTGCCCATGTATTATCAGAATCTCTTAGTTATGCAGTAGATGTAGATAGCTTGATCATTGGCAGTTTTGTTGCAGTATACAACAGATGATTACATCATAAATGGTCTCTGAGGTAAGTGTATCTGGTAATCTTGTATCAAAGGGTATCCTCTTTTTGCCTAGACATTAGCGCCAGTGGCATCATCATGA--TGCTATGC-TACTTTCATTGCAGAGAGCCGGGGTAATCGTTTCTGAATTCCTGAGTTCCTTTGCATTCAG
[abpoa_main] Real time: 0.001 sec; CPU: 0.001 sec; Peak RSS: 0.003 GB.
abpoa t.fa -m 0 -r 1 -S
[main] CMD:  abpoa -m 0 -r 1 -S t.fa
>Multiple_sequence_alignment
AGGTTCTAT----G----GCTATACAAATTCAGCCGCCCATGTATGAAGAGAGTTTCTTAGTTATGCA------GTAGATAGCTCAATCATCAGCAGTTTTATTGCAGCACACAACA------------------------------------------------TCTATCAAAGGGTATCCTCTTCTTGCCTAGAGATTAGCACTAGTGGCATGATTATGACTTGCTATGCATACTTTCGT------------GGGTAATCATTTTTAAATTCCTGATTTCCTTTACATTCAG
AGGTTCTATATTAGTCATGTTATACAAAATCATTTGCCCATGTATTATCAGAATCTCTTAGTTATGCAGTAGATGTAGATAGCTTGATCATTGGCAGTTTTGTTGCAGTATACAACAGATGATTACATCATAAATGGTCTCTGAGGTAAGTGTATCTGGTAATCTTGTATCAAAGGGTATCCTCTTTTTGCCTAGACATTAGCGCCAGTGGCATCATCATGA--TGCTATGC-TACTTTCATTGCAGAGAGCCGGGGTAATCGTTTCTGAATTCCTGAGTTCCTTTGCATTCAG
[abpoa_main] Real time: 0.002 sec; CPU: 0.004 sec; Peak RSS: 0.003 GB.
abpoa t.fa -m 0 -r 1 -p
[main] CMD:  abpoa -m 0 -r 1 -p t.fa
>Multiple_sequence_alignment
AGGTTCTAT----G----GCTATACAAATTCAGCCGCCCATGTATGAAGAGAGTTTCTTAGTTATGCA------GTAGATAGCTCAATCATCAGCAGTTTTATTGCAGCACACAACA------------------------------------------------TCTATCAAAGGGTATCCTCTTCTTGCCTAGAGATTAGCACTAGTGGCATGATTATGACTTGCTATGCATACTTTCGT------------GGGTAATCATTTTTAAATTCCTGATTTCCTTTACATTCAG
AGGTTCTATATTAGTCATGTTATACAAAATCATTTGCCCATGTATTATCAGAATCTCTTAGTTATGCAGTAGATGTAGATAGCTTGATCATTGGCAGTTTTGTTGCAGTATACAACAGATGATTACATCATAAATGGTCTCTGAGGTAAGTGTATCTGGTAATCTTGTATCAAAGGGTATCCTCTTTTTGCCTAGACATTAGCGCCAGTGGCATCATCATGA--TGCTATGC-TACTTTCATTGCAGAGAGCCGGGGTAATCGTTTCTGAATTCCTGAGTTCCTTTGCATTCAG
[abpoa_main] Real time: 0.002 sec; CPU: 0.006 sec; Peak RSS: 0.003 GB.

but not with -p and -S together:

abpoa t.fa -m 0 -r 1 -p -S
[main] CMD:  abpoa -m 0 -r 1 -p -S t.fa
[abpoa_add_graph_sequence] seq_l: 0	start: 0	end: 0.
echo $?
1

Align expanded sequences, trim them and then compute the consensus sequence

Hi Yaon,

I've expanded few sequences with flanking nucleotides, then I've aligned them, obtaining the MSA and consensus sequence.

However, after aligning the expanded sequences, I would like to first delete the additional nucleotides and then calculate the consensus sequence with only the unexpanded sequences, without having to recalculate the whole alignment.

Is it possible?

abPOA as a library -- adding nodes to the graph

Hello, I am trying to use abpoa as a dependency for one of my projects (https://github.com/HopedWall/rs-vgaligner), where I need to align a (sub)graph against some query sequences.

In my project each node is labelled with a sequence, so I'm trying to replicate the same graph structure in abpoa. In the abpoa.h file, I noticed the abpoa_add_graph_node function, however it seems to only accept a single base. Is there any way to create nodes with longer sequences?

Thank you!

header installation :: bad location

Hello

using release taged archive for version 1.2.5 (https://github.com/yangao07/abPOA/releases/download/v1.2.5/abPOA-v1.2.5.tar.gz)

make install instatll header in a non standard location plus install non necessary stuff (*.c)
that leads to non buildable example with installed headers
see

rpmmaker:~ > wget https://github.com/yangao07/abPOA/releases/download/v1.2.5/abPOA-v1.2.5.tar.gz)
rpmmaker:~ > tar xf abPOA-v1.2.5.tar.gz
rpmmaker:~ > mkdir build && cd build
rpmmaker:~/build > cmake -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/abPOA ../abPOA-v1.2.5 && make -j && make install

when we check the installed include directory it will find:

rpm_maker:~/build > ls /tmp/abPOA/include -R
/tmp/abPOA/include:
src

/tmp/abPOA/include/src:
abpoa.c        abpoa_graph.c  abpoa_seed.h  kalloc.h  ksort.h    simd_abpoa_align.c  utils.c
abpoa.h        abpoa_graph.h  abpoa_seq.c   kdq.h     kstring.c  simd_abpoa_align.h  utils.h
abpoa_align.c  abpoa_plot.c   abpoa_seq.h   khash.h   kstring.h  simd_check.c
abpoa_align.h  abpoa_seed.c   kalloc.c      kseq.h    kvec.h     simd_instruction.h

we expect header to be placed directly in include or maybee include/abPOA

this is due to the way that CMakfileLists.txt is defined for include files

install(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/src DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})

should be

install(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/include/ DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} FILES_MATCHING PATTERN "*.h")

regards

Eric

Expected behavior of the extension alignment mode.

Thanks for publishing this method!

I'm trying to clarify whether the 'extension' alignment mode is supposed to behave like an 'overlap' style of scoring, allowing for one free end for both text and query.

To illustrate, if I have two sequences in ext1.fa:

>1
ACGTTGCCCGTTAAGGGT
>2
CCGTTAAGGGTATGTCCC

Aligning these using local alignment mode happens to give an extension/overlap style alignment in this case:

[main] CMD:  abpoa -m1 -r1 ext1.fa
>1
ACGTTGCCCGTTAAGGGT-------
>2
-------CCGTTAAGGGTATGTCCC

...but switching to the extension scoring mode, it is harder to interpret what this mode is doing:

[main] CMD:  abpoa -m2 -r1 ext1.fa
>1
ACGTT-------------GCCCGTTAAGGGT
>2
CCGTTAAGGGTATGTCCC-------------

All examples are from the current main (d2e0186).

Can you clarify what the expected constraints for extension mode are? Is any configuration of abPOA constrained to give the 'overlap' style I show above?

abPOA user specifiable seeds

Hi @yangao07 , I've been experimenting a little with the seeding in abpoa and am wondering if it would be possible to add an option for users to provide alignment seeds? My issue is that for more divergent sequences minimizers are not very ideal for anchoring. I have found more luck using maximal unique matches (MUMs), using a chaining process more like that in the original MUMmer program. Looking forward, I also see a time where we will want to anchor the alignments based upon unique markers in order to facilitate the alignment of highly repetitive sequences (e.g. satellite arrays). Interested in your perspective on this.

Sequence-dependent, reproducible crash in pyabpoa when repeatedly instantiating `pyabpoa.msa_aligner`

Using pyabpoa 1.4.3.

Full test script:

import pyabpoa

good_seqs = [
    "CAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATGACCAGCGTATAAACAGTCTACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACGGGTGGTTTAGTGTGTGTTTCTATGATGGAGAGAGGAGGTTCAGTGTGGGATTGATGAGATACAGTGATGTGTGGAAGTTGGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCTGTGGTTTGGAGATGATAGACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTAGGAGACTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACTGGTGGGATGGGTTGTTTAACTAGCAATTACATAACAGATGGGATGTGATTTGTTAGGAACTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACTGGGATAGTATGTGGAAAGTCTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCGAATGTTGGGAGTAGAAGGTCGATGAAGATTGAGGGAAGAACGGAGTAGTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCGATGATGTAGTAGTAAGGGTCGTGAAGTGGAAGGTGAGATTCAGGAGGAGGGTAATGATAGACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGTAAGCGTCAGAGGAGCCAAATTCAAAAAAGCCTGCTTTCTAGCAGGCTTTTTGCTTTCTAATGGAAGCATAAAAAAATGGCGCCGATGGGCGCCATTTTTCACTGCGGCAAGAATTACTTCAGACAGAGTATCAAAAGGCGAAACCTCCGCAATGCGGAGGTTTCTTTTTAAAGACCTTATTAATCGGTCGGCAGATGCTGGGTGATAAAATGACGGGTCAGGCTATTTGCACGGGTTTTATCCAGCGGTTTACCGCTCAGTGCTGCACGCAGCCACAGACCATCGATCAGTGCTGCCAGACCATAACCTGCCTCTTGTGCCTGTTCACGAGGCAGTTCACGACGAAATTCGCTAACCAGATTGCTCAGCAGACGACGACTGCTAACCTGCTGCAGACGATACAGCATCGGCTGATGCATGCTGATTGCCCAAAATGCAGCCATGCTTTCATTGCTGCGCTGCTAACCTGGGTTTCATCAAAATTACCACCAACAATTGCCTGCAGACGCTGTTCTGCGCTACCCTGCGGCAGTGCATGCAGACGATTCAGAACTGCATCACGCAGCTGGCTGGTAATATCACGCATGGTTGCTTCCAGTAGACCGTTTTTATCGCGGAAATAATGGCTAATAATACCGGTGCTAACACCGGCACGACGTGCAATCTGTGCAATGGTTGCATCATGCATACCAACTTCATTAATTGCTTCCAGGGTTGCATCAATCAGCTGACGACGACGAATGCTCTGCATACCCAGTTCGGCATTCGTGAACCCCCTTTCTATGTTATGGTTAAACAAAATTATTTGTCAGAGGCGGTGTTTCGTCCTTTAGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAACCTTAACGATACGGTACGTTTCGTATCATGTCAATTGGTAACGAATCAGATTCCACCGTACGTCGCTCCTATTTAAGCAAAAAAAACCCCGCCCTGTCAGGGGCGGGGTTTTTTTTTTCTTTGGGTATAGCGTCGTGGACAGTCATTCATCTTTCTGCCCCTCCAAAAGCAAAAACCCGCCGAAGCGGGTTTTTACGTAAATCAGGTGAAACTGACCGATAAGCCGGACCTTATTAACACTGTGTACCCGGACAAACACCATTAATCAGCAGGAAGGTAAATTCTTCAATATCCTGTTCAACGGTCAACTGTTCGGTCAGCAGGCGATACCAACAAAAACCAAAAATCATATCCAGCAGCAGTTCACGATTGATATCTTTCGGCAGTTCACCATTGCTAATGGCATCTTCAACCAGTTTTTCGGTATCTCACGACGACGTTCCATAAACTGATCTTTCAGTTGGGTCAGGGTTACAGGGTCCAACTGTGCTTCTGCAATAACACAACGAAATGCTTCACCACAAATGGTTTCACGCCAAACTTTCCACAGATTATGCAGCAGAAAATCCAGATCGGCTTTAAAGCTACCCAAATCCGGAAATTTACGTACCTGTTCGATTTCATTTTCATACACTTCGGCAATCAGTGCTGCTTTGTTGGTCCACCAACGATAAATGGTCGGTTTGCCTGCACCGGCGCGACGTGCCACGCTTTCAATGCTCAGACCGCTATAACCACATTCTTTCAGGATTTCAATGGTGCTGGTCAGAATTGCTTTATGGGTATGCGGACTACGCAGGCTACCAATGCTGCTACGGCTCGGGGTACGTGCATGCGTAAAAATTGCTTCTCTTTGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTCGTCCTTTAGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAAATTGTGAGCGCTCACAATTCCACACATTATACGAGCCGATGATTAATTGGAGCGCTTCCTTCAAAGCGGACCAAAACGAAAAAAGGCCCCCCTTTCGGGAGGCCTCTTTTCTGGAATTTGGTACCGAGTGCAGACGTAAAAAAAGCGGCGTGGTTAGCCGCTTTTTTAATTGCCGGACCTTATTACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCCAGGGTGGTTTTTCTTTTCACCAGTGAGACTGGCAACAGCTGATTGCCCTTCACCGCCTGGCCCTGAGAGAGTAGCAAGCGGTCCACGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTGGTTAACGGCGGGATATAACATGAGCTATCTTCGGTATCGTCGTATCCCACTACCGAGATATCCGCACCAACGCGCAGCCCGGACTCGGTAATGGCGCGCATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATGCCCTCATTCAGCATTTGCATGGTTTGTTGAAAACCGGACATGGCACTCCAGTCGCCTTCCCGTTCCGCTATCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGACGCGCCGAGACAGAACTTAATGGGCCCGCTAACAGCGCGATTTGCTGGTGACCCAATGCGACCAGATGCTCCACGCCCAGTCGCGTACCGTCCTCATGGGAGTAAATAATACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAAATAACGCCGGAAGCATCGTAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCGGATAGTTAATGATCAGCCCACTGACGCGTTGCGCGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGACGCCGCTTCGTTCTACCATCGACACCACCACGCTGGCACCCAGTTGATCGGCGCGAGATTTAATCGCCGCGACAATTTGCGACGGCGCGTGCAGGGCCAGACTGGAGGTGGCAACGCCAATCAGCAACGACTGTTTGCCCGCCAGTTGTTGTGCCACGCGGTTGGGAATGTAATTCAGCTCCACCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGAAACGTGGCTGGCCTGGTTCACCACGCGGGAAACGGTGGTCGGACGCCGGCATACTCTGCGACATCGTATAACGTTACTGGTTTCATTCGGGAACCCCCTTTCTTTGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTCGTCCTTTTTAGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAGCTAGCATTATATTGAACGTCCAATCAATTGTCTATTGGTAACGAATCCCTCTCACCCGCGCTCTCCTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGA",
    "CACGTATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGAAACTTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATGACCAGCGTATAAACAGTCTACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACGGGTGGTTTAGTGTGTGTTTCTATGATGGAGAGAGGAGGTTCAGTGTGGGATTGATGAGATACAGTGATGTGTGGAAGTTGGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCTGTGGTTTGGAGATGATAGACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTACGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACTGGTGGGATGGGTTGTTTAACTAGCAATTACATAACAGATACGGGATGTGATTTGTTAGGAACTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACTGGGATAGTATGTGGAAAGTCTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCGAATGTTGGGAGTAGAAGGTCGATGAAGATTGAGGGAAGAACGGAGTAGTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCGATGATGTAGTAGTAAGGGTCGTGAAGTGGAAGGTGAGATTCAGGAGGAGGGTAATGATAGACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAAGGTATACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGTAAGATAACAGATACTTCGGTATCTGTTAATCCCTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATTATTATTCTGGCTCCTCAGTAAGCGTCAGAGGAGCCAAATTAAAAAAGCCTGCTTTCTAGCAGGCTTTTTGCTTTCTAATGGGAGCATAAAAAAATGGCGCCGATGGGCGCCATTTTTCACTGCGGCAAGAATTACTTCAGACAGAGTATCAAAAGGAAACCTCCGCAATGCGGAGGTTTCTTTTTAAAGACCTTATTAATCGGTCGGCAGATGCTGGGTGATAAAATGACGGGTCAGGCTATTTGCACGGGTTTTATCCAGCGGTTTACCGCTCAGTGCTGCACGCAGCCACAGACCATCAATCAGTGCTGCCAGACCATAACCTGCCTCTTGTGCCTGTTCACGAGGCAGTTCACGACGAAATTCGCTAACCAGATTGCTCAGCAGACGACGACTGCTAACCTGCTGCAGACGATACAGCATCGGCTGATGCATGCTGATTGCCCAAAATGCCAGCCATGCTTTCATTGCTGCGCTGCTAACCTGGGTTTCATCAAAATTACCACCAACAATTGCCTGCAGACGCTGTTCTGCGCTACCCTGCGGCAGTGCATGCAGACGATTCAGAACTGCATCACGCAGCTGGCTGGTAATATCACGCATGGTTGCTTCCAGTAGACCGTTTTTATCGCGGAAATAATGGCTAATAATACCGGTGCTAACACCGGCACGACGTGCAATCTGTGCAATGGTTGCATCATGCATACCAACTTCATTAATTGCTTCCAGGGTTGCATCAATCAGCTGACGACGACGAATGCTCTGCATACCCAGTTTCGGCATTCGTGAACCCCCTTTCTATGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTCGTCCTTTAGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAACCTTAACGATACGGTACGTTTCGTATCATGTCAATTGGTAACGAATCAGATTCCACCGTACGTCGCTCCTATTTAAGCAAAAAAACCCCGCCCTGTCAGGGGCGGGGTTTTTTTTTTCTTTTGGGTATAGCGTCGTGGACAGTCATTCATCTTTCTGCCCCTCCAAAAGCAAAAACCCGCCGAAGCGGGTTTTTACGTAAATCAGGTGAAACTGACCGATAAGCCGGACCTTATTAACACTGTGTACCCGGACAAACACCATTAATCAGCAGGAAGGTAAATTCTTCAATATCCTGTTCAACGGTCAACTGTTCGGTCAGCAGGCGATACCAACAAAAACCAAAAATCATATCCAGCAGCAGTTCACGATTGATATCTTTCGGCAGTTCACCATTGCTAATGGCATCTTCAACCAGTTTTTCGGTATCTCACGACGACGTTCCATAAACTGATCTTTCAGTTGGGTCAGGGTTACAGGGTCCAACTGTGCTTCTGCAATAACACAACGAAATGCTTCACCACAAATGGTTTCACGCCAAACTTTCCACAGATTATGCAGCAGAAAATCCAATATCGGCTTTAAAGCTACCCAAATCCGGAAATTTACGTACCTGTTCGATTTCATTTTCATACACTTCGGCAATCAGTGCTGCTTTGTTGGTCCACCAACGATAAATGGTCGGTTTGCCTGCACCGGCGCGACGTGCCACGCTTTCAATGCTCAGACCGCTATAACCACATTCTTTCAGGATTTCAATGGTGCTGGCGAGAATCGCTTTATGGGTATGCGGACTACGCAGGCTACCAATGCCGCCACACCGGCTCGGGGTACGTGCCATTAGTGGACCCCCTTTCTTTGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTCGTCCTTTAGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAAATTGTGAGCGCTCACAATTCCACACATTATACGAGCCGATGATTAATTGTCAACACTCCTTCAAAGCGGACCAAAACGAAAAAAGGCCCCCCTTTCGGGAGGCCTCTTTTCTGGAGTTTGGGGACCGAGTGCAGACGTAAAAAAAGCGGCGTGGTTAGCCGCTTTTTTAATTGCCGGACCTTATTACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCGACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCCAGGGTGGTTTTCTTTCACCAGTGAGACTGGCAACAGCTGATTGCCCTTCACCGCCTGGCCAGAGAGTTGCAGCAAGCGGTCCACGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTGGTTAACGGCGGGATATAACATGAGCTATCTTCGGTATCGTCGTATCCCACTACCGAGATATCCGCACCAACGCGCAGCCCGGACTCGGTAATGGCGCGCATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATGCCCTCATTCAGCATTTGCATGGTTTGTTGAAAACCGGACATGGCACTCCAGTCGCCTTCCCGTTCCGCTATCGGCTGGATTGATTTCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGACGCGCCGAGACAGAACTTAATGGGGCCTGCTAACAGCGCGATTTGCTGGTGACCAATGCGACCAGATGCTCCACGCCCAGTCGCGTACCGTCCTCATGGGAGTAAATAATACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAAATAACGCCGGAACATTAGTGCAGGCAGCTTCCACAGCAATGGCATCCTGGTTATCCAGCGGATAGTTAATGATCAGCCCACTGACGCGTTGCGCGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGACGCCGCTTCGTTCTACCATCGACACCACCACGCTGGCACCCAGTTGATCGGCGCGAGATTTAATCGCCGCGACAATTTGCGACGGCGCGTGCAGGGCCAGACTGGAGGCGGCAACGCCAATCAGCAACGACTGTTTGCCCGCCAGTTGTTGTGCCACGCGGTTGGGCATGTAATTCAGCTCCACCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGAAACGTGGCTGGCCTGGTTCACCACGCGGGAAACGGTCATATAAGAGACACCGGCATACTCTGCGACATCGTATAACGTTACTGGTTTCATTGGAACCCCCTTTCTTCGTTATGGTTAAACAAAATTATTTGTAGAGGCGGCGTTGTCCTTTAGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAGCTAGCATTATATTGAACGTCCAATCAAATCGCTTGATTGGTAACGAATCCCTCTCACCCGCGCTCTCCTCGTCGACCTTGATGTTTCCAGCGCGATTCGAGGACCTTCAGCG",
    "TCTGCCCTCTACTGGTTCAGACGTATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATGACCAGCGTATAAACAGTCTACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACGGGTGGTTTAGTGTGTGTTTCTGATGGGAGAGAGGAGGTTAAGATGTCAGATGATGAGATACAGTGATGTGGGAGAAGTGGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCTGTGGTTTGGAGATGATAGACTGTGATGGAAGTTAGAGGGTCGGTTGGGGGTGGGGAGAGTATTCGAGAGTTGTATGTTAGGGTAAGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACTGGTGGGATGGGTTGTTTAGATACATAACAGATACGGGATGTGATTTGTAGGAACTTATTGGTGGTGTAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACTGGGATAGTATGTGGAAAGTCTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCGAATGTTGGGAGTAGAAGGTCGATGAAGATTGAGGGAAGAACGGAGTAGTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCGATGATGTAGTAGTAAGGGTCGTGAAGTGGAAGGTGAGATTCAGGAGGAGGGTAATGATAGACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGTTTTTTGGGAAAGATGACACAATTCTCGGTATCTGTTATCTGTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATATATTTTGGTGGAAGGGCTCGGAGTTGTGGTAAATCTACTGTATCGCTCCTCAGTAAGCATCAAGAGAGCCAAATTCAAAAAAGCCTGCTTTCTAGCAGGCTTTTTGCTTTCTAATGGAAGCATAAAAAAATGGCGCCGATGGGCGCCATATTCACTGCGGCAAGAATTACTTCAGACAGAGTATCAAAAGGCGAAACCTCCGCAATGCGGAGGTTTCTTTTTTAAAGACCTTATTAATCGGTCGGCAGATGCTGGGTGATAAAATGACGGGTCAGGCTATTTGCACGGGTTTTATCCAGCGGTTTACCGCTCAGTGCTGCACGCAGCCACAGACCATCAATCAGTGCTGCCAGACCATAACCTCCTCTTGTGCCTGTTCACGAGGCAGTTCACGACGAAATTCGCTAACCAGATTGCTCAGCAGACGACGACTGCTAACCTGCTGCAGACGATACAGCATCGGCTGATGCATGCTGATTGCCCAAATGCCAGCCATGCTTTCATTGCTGCGCTGCTAACCTGGGTTTCATCAAAATTACCACCAACAATTGCCTGCAGACGCTGTTCTGCGCTACCCTGCGGCAGTGCATGCAGACGACTCAGAACTGCATCACGCAGCTGGCTGGTAATATCACGCATGGTTGCTTCCAGTAGACCGTTTTTATCGCGGAAATAATGGCTAATAATACCGGTGCTAACACCGGCACGACGTGCAATCTGTGCAATGGTCGGTCGTCCATACCAACTTCATTAATTGCTTCCAGGGTTGCATCAATCAGCTGACGACGACGAATGCTCTGCATACCCAGTTCAGGCATTCGTGAACCCCCTTCTATGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTCATCCTTCAGGACTGAGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAACCTTAACGATACGGTACACTTCCGTATCATGTCAATTGGTAACGAATCAGATTCCACCGTACGTCGCTCCTATTTAAGCAAAAAAAACCCCGCCCTGTCAGGGGCGGGGTTTTTTTTTCTTTTGGGTATAGCGTCGTGGACAGTCATTCATCTTTCTGCCCCTCCAAAAGCAAAAACCCGCCGAAGCGGGTTTTTACGTAAATCAGGTGAAACTGACCGATAAGCCGGACCTTATTAACACTGTGTACCGGACAAACGCGGATTAATCAGCAGGAAGGTAAATTCTTCAATATCCTCAACGGTCAACTGTTCGGTCAGCAGGCGATACCAACAAAAACCAAAAATCATATCCAGCAGCAGTTCACGATTGATATCTTTCGGCAGTTCACCATTGCTAATGGCATCTTCAACCAGTTCGGTATCTCACGACGACGTTCCATAAACTGATCTTTCAGTTGGGTCAGGGTTACAGGGTCCAACTGTGCTTCTGCAATAACACAACGAAATGCTTCACCACAAATGGTTTCACGCAGAACTTTCCACAGATTATGCAGCAGAAAATCCAGATCGGCTTTAAAGCTACCCAAATCCGGAAATTTACGTACCTGTTCGATTTCGTTTTTCATACACTTCGGCAATCAGTGCTGCTTTGTTGGTCCACCAACGATCAATGGTCGGTTTGCCTGCACCAGCACGACGTGCCACGCTTTCAATGCTCAGACCGCTATAACCACATTCTTTCAGGATTTCAATGGTGCTGGTCAGAATTGCTTTATGGGTATGCGGACTACGCAGGCTACCAATGCTGCTACGGCTCGGGGTACGTGCGGTTGGTCGGACCCCCTTTCTTTGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTGGTCCTTTTAGGGACTCGTCAGTGTACTGATACAAGTAGACAGCGCTAGTAAATTGTGAGCGCTCACAATTCCACACATTATACGAGCCGATGATTAATTGTCAACACTCCTTCAAAGCGGACCAAAACGAAAAAAGGCCCCCCTTTCGGGAGGCCTCTTTTACTGGAATTTGGTACCGAGTTGCAGCATAAAAAGCGGGGCGTCATCCTTCGCTTTTTTAATTGCCGGACCTTATTACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGATTTGTATTGGGCGCCAGGGTGGTTTTTCACCAGTGAGACTGGCAACAGCTGATTGCCCTTCACCGCCTGGCCCTGAGAGTTGCAGCAAGCGGTCCACGCCGTTGGTTTTTCCCAGCAGGCGAAAATCCTGTTTGATGGTGGTTAACGGCGGGATATAACATGAGCTATCTTCGGTATCGTCGTATCCACTGCAGGAATCCGCGCAACGCGCAGCCCGGACTCGGTAATGGCGCGCATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATGCCCTCATTCAGCATTTGCATGGTTTGTTGAAAACCGGACATGGCACCCAGTCGCCTTCCCGTTCCGCTATCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCCAGACGCGCGAGACAGAACTTAATGGGCCCGCTAACAGCGCGATTTGCTCGAGGTGACCCAATGCGACCAGATGCTCCACGCCCAGTCGCGTACCGTCCTCATGGGAGTAAATAATACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAAATAACGCCGGAACATTAGTGCAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCGGATAGTTAATGATCAGCCCACTGACGCGTTGCGCAGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGACGCCGCTTCGTTCTACCATCGACACCACCACGCTGGCACCCAGTTGATAGGCGCCGAGGTTAAATAGCCACAACGATTTCAGCGCGGCACGTCCACCGACTTGGAGGTGGCAACGCCAATCAGCAACGACTGTTTGCCCGCCAGTTGTTGTGCCACGCGGTTGGGAATGTAATTCAGCTCCACCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGAAACGTGGCTGCTCCTGGTTCACCACGCGGGAAACGGTCATATAGACACCGGCATACTCTGCGACATCGTATAACGTTACTGGTTTCATTCGGGAACCCCCTTTCTTTGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTCGTCCTTTAGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAGCTAGCATTATATTGAACGTCCAATCAATCTATCTCTATTGGTAACGAACCCCTCTCACCCGCGCTCTCCTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCATGAGCAATAC",
    "TATGTATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATGACCAGCGTATAAACAGTCTACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACGGGGTGGTTCAGCGTGTGTTTCTATGATGGAGAGAGGAGGTTCAGTGTGGGATTGATGAGATACAGTGATGTGTGGAAGTTGGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCTGTGGTTTGGAGATGATAGACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTACGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACTGGTGGGATGGGTTGTTTAACTAGCAATTACATAACAGATACGGGATGTGATTTGTTAGGAACTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACTGGGATAGTATGTGGAAAGTCTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCGAATGTTGGGAGTAGAAGGTCGATGAAGATTGAGGGAAGAACGGAGTAGTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCGATGATGTAGTAGTAAGGGTCGTGAAGTGGAAGGTGAGATTCAGGAGGAGGGTAATGATAGACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGGAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTCTTCAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGGTGGTAATCTATGATTCTGGCTCCTCAGTAAGCGTCAGAGGTACGGTAATTTGAAAAAAGCCTGCTTTCTAGCAGGCTTTTTGCTTTCTAATGGAAGCATAAAAAAATGGCGCCGATGGGCGCCATTTTTCACTGCGGCAAGAATTACTTCAGACAGAGTATCAAAAGGCGAAACCTCCGCAATGCGGAGGTTTCTTTTTAAAGACCTTATTAATCGGTCGGCAGATGCTGGGTGATAAAATGACGGGTCAGGCTATTTGCACGGGTTTTATCCAGCGGTTTACCGCTCAGTGCTGCACGCAGCCACAGACCATCAATCAGTGCTGCCAGACCATAACCTGCCTCTTGTGCCTGTTCACGAGGCAGTTCACGACGAAATTCGCTAACCAGATTGCTCAGCAGACGACGACTGCTAACCTGCTGCAGACGATACAGCATCGGCTGATGCATGCTGATTGCCCAAAATGCCAGCCATGCTTTCATTGCTGCGCTGCTAACCTGGGTTTCATCAAAATTACCACCAACAATTGCCTGCAGACGCTGTTCTGCGCTACCCTGCGGCAGTGCATGCAGACGATTCAGAACTGCATCACGCAGCTGGCTGGCATCACGCATGGTTGCTTCCAGTAGACCGTTTTTATCGCGGAAATAATGGCTAATAATACCGGTGCTAACACCGGCACGACGTGCAATCTGTGCAATGGTTGCATCATGCATACCAACTTCATTAATTGCTTCCAGGGTTGCATCAATCAGCTGACGACGACGAATGCTCTGCATACCCAGTTTCGGCATTGGTGCACTCTCTTTCCATGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTCGTCCTTTAGGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAACCTTAACGATACGGTACGTTTCGTATCATGTCAATTGGTAACGAATCAGATTCCACCGTACGTCGCTCCTATTTAAGCAAAAAAAAACCCCGCCCTGTCAGGGGCGGGGTTTTTTTTTCTTTTGGGTATAGCGTCGTGGACAGTCATTCATCTTTCTGCCCCTCCAAAAGCAAAAACCCGCCGAAGCGGGTTTTTACGTAAATCAGGTGAAACTGACCGATAAGCCGGACCTTATTAACACTGTGTACCCGGACAAACACCATTAATCAGCAGGAAGGTAAATTCTTCAATATCCTGTTCAACGGTCAACTGTTCGGTCAGCAGGCGATACCAACAAAAACCAAAAATCATATCCAGCAGCAGTTCACGATTGATATCTTTCGGCAGTTCACCATTGCTAATGGCATCTTCAACCAGTTTTTCGGGTATCTCACGACGACGTTCCATAAACTGATCTTTCAGTTGGGTCAGGGTTACAGGGTCCAACTGTGCTTCTGCAATAACACAACGAAATGCTTCACCACAAATGGTTTCACGCCAAACTTTCCACAGATTATGCAGCAGAAAATCCAGATCGGCTTTAAAGCTACCCAAATCCGGAAATTTACGTACCTGTTCGATTTCATTTTCATACACTTCGGCAATCAGTGCTGCTTTGTTGGTCCACCAACGATAAAATGGTCGGTTTGCCTGCACCGGCGCGACGTGCCACGCTTTCAATGCTCAGACCGCTATAACCACATTCTTTCAGGATTTCAATGGTGCTGGTCAGAATTGCTTTATGGGTATGCGGACTACGCAGGCTACCAATGCTGCTACGGCTCGGGGTACGTGCCATTAGTGGACCCCCTTTCTTTGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTCGTCCTTTAGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAAATTGTGAGCGCTCACAATTCCACACATTATACGAGCCGATGATTAATTGTCAACACTCCTTCAAAGCGGACCAAAACGAAAAAAGGCCCCCCTTTCGGAGGCCTCTTTTCTGGAATTTGGTACCGAGTGCAGACGTAAAAAAAGCGGCGTGGTTAGCCGCTTTTTTAATTGCCGGACCTTATTACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCCAGGGTGGTTTTTCTTTCACCAGTGAGACTGGCAACAGCTGATTGCCCTTCACCGCCTGGCCCTGAGAGAGTTGCAGCAAGCGGTCCACGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTGGTTAACGGCGGGATATAACATGAGCTATCTTCGGTATCGTCGTATCCCACTACCGAGATATCCGCACCAACGCGCAGCCCGGACTCGGTAATGGCGCGCATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATGCCCTCATTCAGCATTTGCATGGTTTGTTGAAAACCGGACATGGCACTCCAGTCGCCTTCCCGTTCCGCTATCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGACGCGCCGAGACAGAACTTAATGGGCCCGCTAACAGCGCGATTTGCTGGTGACCCAATGCGACCAGATGCTCCACGCCCAGTCGCGTACCGTCCTCATGGGAGTAAATAATACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAAATAACGCCGGAACATTAGTGCAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCGGATAGTTAATGATCAGCCCACTGACGCGTTGCGCGAGAGATTGTGCACCGCCGCTTTACAGGCTTCGACGCCGCTTCGTTCTACCATCGACACCACCACGCTGGCACCCAGTTGATCGGCGCGAGATTTAATCGCCGCGACAATTTGCGACGGCGCGTGCAGGGCCAGACTGGAGGTGGCAACGCCAATCAGCAACGACTGTTTGCCCGCCAGTTGTTGTGCCACGCGGTTGGGAATGTAATTCAGCTCCACCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGAAACGTGGCTGGCCTGGTTCACCACGCGGGAAACGGTCATATAAGAGACACCGGCATACTCTGCGACATCGTATAACGTTACTGGTTTCATTCGGGAACCCCCTTCTTTGTTATGGTTAAACAAAATTATTTGTAGAGGCGGTGTTTCGTCCTTTAGGGACTCGTCAGTGTACTGATACAAGTACAGACAGCGCTAGTAGCTAGCATTATATTGAACGTCCAATCAATTGTCTATTGGTAACGAATCCCTCTCACCCGCGCTCTCCTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGC",
]

bad_seqs = [
    "CAGTGCTCTTGTGGGTCCGATACGCAGTGGTAGGAACTACATAATGAGACTAGCTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGGAGGAGGTTCAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGAGAGTCAGATGAGTTCGAAGGTTGGAGAGAAGATGTAGTAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGAGTGTAGTGAGTGTGGTTCGAGTTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTAGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATGTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCAGTGCGATTGAGGACCTTCAGTGCAGCAATACG",
    "CAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTGGCTCTAGGGACGAACGTTAGCAGCACTATTATTGGGCCATATTCCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGGAAGGAAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGGTTCAGGAGGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTGAGTTGATGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATATATTAAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACGT",
    "CAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGGTCGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGGATGGATTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACTGCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATATATTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTAGGCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATA",
    "CAGTGCTCTTGCATCGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTCAGCACTATTCTTGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATGAGAAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTCAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATATCTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTCATGGGAGAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCAACCTCGATGTTTCCAGTGCGATTGAGGACCTTCAGTGGCAATACGTA",
    "CAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTAGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATATATTTTTTTTCAACAGTTAGCCGCGTCGCCAGCTATGTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACA",
    "CAGTGTATTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTTGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATAGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTCAGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGAGAGTCTTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTAAATGGTTTTCAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATAGAGTTGACTTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTCTGTTTTTTTCAACAGTTAGCCGCGTCGCGCGGCTATCTGTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCATCCGATCAAGAAACCTGTCGTGCAGCAATACGTT",
    "CAGTGCTCTTGTGGGTCCGATCGCCAGATGATAAGGAACTACATAATACAGACTAGCTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGAGAGAAGATAGGTGACAGAGTGGAGTAGTAGTGGAGTCTGGGAGGATTGAGATAGTAGACTAGAAGGTTGGAGAGAGATGTAGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTTCGAGAGTTGTATGTTAGGGTGAGAGATTAGAGATGAGTTGGATGATCTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTGCAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTGAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTCCGACTCTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAGGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTCTGTGTTTTTTGAACAGTTGCACATTGACGCAGCTATATATTTTAATGGAAGGGCTAGGAGTTGTGGTAATCTTGTGTTCCTGGCTCCTCAGCGGTCTTTAGTCAACCTAGATGCTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACG",
    "TATGTATTGCTGTCCGATTCGCCAGATGATAAGGAACTACATAATATGTATCTACTATAGGGACGAACGTTAGCAGCACTATCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGAAGTAGTGGAGTCTGGGAGGATTGAGATGAGTCCAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATCGAGAGTTGTACGTCAGGGGTGTGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGTACTTATGCTATTGGTGGGTGTAGAGAGCGGATGTTATGCAAGTGTAATTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGGTATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCAAACGGTCTAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAATCCTCCTTTCAGTGC",
    "CAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTATTGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCAGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATGTG",
    "CAGTGCTCTTGTGGGTCCGATTCGCCAGATGGTGGGGAATTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGAGAAGTAGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTGCAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATCAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGGTTGGAGTTATGGTCGGATATGCATGGTAGTTGAGTGTGGTTACGAGTTGATAGAGGGAGAACATTGAGGAAGAAGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTGTGTGTGTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATA",
    "TCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGGAAGAGGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACG",
    "ATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTCTAGGGGTTTTTTGGGGAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGC",
    "ATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAAGGATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAATAGGTGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAAATGGGTCTTGAGGGGTTTTTTGGGCAAGATAACAGATAGATTTGCACTGTTATCTGTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGC",
    "CAGTGTATCTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTTTAGAGTTGATAGAGGGAGAACGTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTTTATCAGAGCAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATGACAGATCTTCGGTATCTGTTATCTGTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACGTGA",
    "CGTATTGCTCAGTGCTCTTGTGGTCTGATTTGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGGAGCGCAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTCTGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGCAAAGATAAGAGATACTTCGGTATCTGTTATCTGTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGCGCGATTGAGGACCTTCAGTGC",
    "TTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTGAAACGGGTCTTGAGGGGTTTTTTGGGAAAGAATAACAGATACTTCGGTATCTGTTATCTGTTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGGCTCTGTGTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACG",
    "CGTATTGCTCAGTGCTCTTGTGGTCTGATTTGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGTGTGTTTGTAAAAGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTACGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATAAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAGATAACAGATACTTCGGGTTCCTGTTATCTGTTTTTTTTCACAGTTAGCCGCGTTCGCGCGGCGATCGCTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGCGCGAAGCATACGTAACTGAACCAAAGACGACACATACA",
    "ATGTGTAACCTACTCAACCAAGTCTCGTATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTGTAATTAGACTCACTATATAGGGACAACATTTAGGCCCTATCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGTAATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGAAGTGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAGGTTCAAGGTAAGTTGAGTGGAGGATACTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGAGAATGTGGGGTGGACGGATTATGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGGATGGATTTGGTGGGAAGAAGAAGACGCGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTCAACAGTTAGCCGCGTTCGCACAGCTATATATTTTGGTGGAAGGGCTTCGGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACG",
    "ACATTTTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACCATAATACGAACGTTAGCAGCACTATTCGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAGTGTGACATAAGATGGAGATAGAGGGTGATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGACATGAGTTTGAAGGTTGGAGAGAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAGGGTCTTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTAGGGTCTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGAGGCCGAGTCGGTCGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAGTGATTCAGTTGGTGGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAAAACGGGTCTTGAGGGGTCTTGGGGAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCCATCCCCGTTTTTTTTCTTCTGGTGGAAGGGCTCGGAGTTGTGGAATCTATGTATCCTGGCTCCTCAGCGGCTTTCGTTGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACGCTGAACCAAGCAGACCACAT",
    "CCACGTATTGCTCAGTGCTCTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAAGAGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGGAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACGGTAACTGAACGAGAT",
    "TGTATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGGAGGTTCAGGAGAGGGATCATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAGGCTCAGGTGAGAAGGATATGGAATCGCGAAGTGGAAGGTGAGATTGAGTCGGTGGAAGAAGGGAAAATGATTGAAGTTGGTGGTTGTATCATCGCCAGATTTAGTAGAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGTAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTTTTCACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTTGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAACAATACGTAGTTGTCAAGTGCTGACAT",
    "ATGTACATCTACTCGTTCCGTTGGTCTTTGCTGAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACGTTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAAATAACAGATACTTCGGTATCTGTTCTGTATTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGCATCTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACGTAGT",
    "TTGTGTTTTGTCCCTCTACTCGTAGGTTGGTCTTGCTCAGTGCTCTTGTGGGTCCAATTAGCCAGATGATAAGGAACTACGTAATTAGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTACTGGTGGGTGTGAGAGCGGATGTTATGATCTAGGTATTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTAGTTGAGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTCTGTGTTCAACAGTTAGCCGCGTTCGCGCGGCTATATATTTTGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCGAGTGCGATTGAGGACCTTCAGTGCAGCAATACGTA",
    "AACGTATTGCTCAGTGCTCTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGAGATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGCGAGTAGTAGTGAGTCTGGAAGGATGAGACTAGTTTGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAAGAGGGTCAGTTGAGGTGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGCGCGATTTGAGGACCTTCAGTGCAGCAATACGAACGGTCGCTGGCCGCAGGGACG",
    "CGTATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTGCATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTCGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACGCTGCTAAGGGTTAGACATAAACAT",
    "ATGTCCCTGTACTTGGTTAGAATTATCGTATTGCTCAGTGCTCTTGTGGGTCCGATTCACAATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGGGAGGAGATTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGATCTGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGGAAGAAGGAGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATATATTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACGT",
    "ACGTATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACTGAACGTTAGCAGCACTATTCGGGCCTATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGTAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGAAGGTTGGAGAGAAGATGTCGTGAGAGAAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTCTAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATACGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAGGTCAGGCGGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAAAGATAACAGATACTTCGGTATCTGTTATCTGTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGGTAATTTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAATACGTAGTCTAGTCAAGCAGACGACATAAACAT",
    "TAAATGATATTGCTCAGTGCTCTTGTGGGTCCGATTCGCCAGATGATAAGGAACTACATAATACGACTCACTATAGGGACGAACGTTAGCAGCACTATTCTCGGGCCATCTCAGTGCAACAGTAACGTTAGCTAGCCTCAGAGGAAAGGAGAAAGGTGACATAAGATGGAGATAGAGGGCTATGATGGAGAGAGGAGGTTCAGGAGGAGGAATTATGGTGACAGAGTGAGAAGAGTAGTGGAGTCTGGGAGGATTGAGATGAGTTCGGAGGTTGGAGAGAAAGACGTCGGCGAGAGAGGAGGATGATACTGTGATGGAAGTTAGAGGGTCAGTTGAGGTGGGAGAGTATTCGAGAGTTGTATGTTAGGGTGCGAGATTAGAGATGAGTTGGACGTTGAGGAATGGTGGATTGACAGGATGTGGGATGGGTTTTACAGGGTGTGTTTGTAAAGGGTCTAGCAATTACATAACAGATAGTGTGTTGTGTGTAGGGTTATGCTTATTGGTGGGTGTAGAGAGCGGATGTTATGAGTGTTGGGTCTTGGAGGTGTAGGGAGTAAACGGTTGTGGGTAATGAGTTGACTGAGGTTGTGGTTGGATATGCATGGTAGTTGAGTGTGGTTGCTAGAGTTGATAGAGGGAGAACGTTGAAAGAGGATGAAGAGGCGGAGTAGTTGGTTGTTAGGACAGTTGGGTATGGAGAAAGGTCAGGGTGAGAAGGATATGGATCGTGAAGTGGAAGGTGAGATTCAGTTGGTGGGAAGAAGGAAACGAGATTGAAGTTGGTGGTTGCATCACATTGCCATCAGTAATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGGGAACATATATATTTTGCATCTGTTATCTGTTTTTTTTCAACAGTTAGCCGCGTTCGCGCGGCTATCTGTTTTTTTTGGTGGAAGGGCTCGGAGTTGTGGTAATCTATGTATCCTGGCTCCTCAGCGGTCTTTCGTCGACCTTGATGTTTCCAGTGCGATTGAGGACCTTCAGTGCAGCAGAACACCAATGGTACCAGTAGGTTAACAGT",
]

#aligner = pyabpoa.msa_aligner(aln_mode="l")

def poa(seqs, **kwargs):
    aligner = pyabpoa.msa_aligner(aln_mode="l")
    res = aligner.msa(seqs, out_cons=True, out_msa=False, max_n_cons=1, min_freq=0.25)
    return res.cons_seq


for i in range(1):
    res = poa(bad_seqs)

for i in range(2):
    res = poa(good_seqs)

for i in range(1):
    res = poa(bad_seqs)

print("RESULT:")
print(res)

I get a crash:

...
7f795ec6c000-7f795ec6e000 rw-p 0005c000 00:2b 82759669615                /home/jqs1/micromamba/envs/abpoa_test/lib/python3.12/site-packages/pyabpoa.cpython-312-x86_64-linux-gnu.so
7f795ec6e000-7f795edf6000 rw-p 00000000 00:00 0
7f795edff000-7f795ee04000 rw-p 00000000 00:00 0
7f795ee04000-7f795ee0b000 r--s 00000000 fd:00 50346192                   /usr/lib64/gconv/gconv-modules.cache
7f795ee0b000-7f795ee0c000 rw-p 00000000 00:00 0
7f795ee0c000-7f795ee0d000 r--p 00021000 fd:00 50332498                   /usr/lib64/ld-2.17.so
7f795ee0d000-7f795ee0e000 rw-p 00022000 fd:00 50332498                   /usr/lib64/ld-2.17.so
7f795ee0e000-7f795ee0f000 rw-p 00000000 00:00 0
7fff2e964000-7fff2e986000 rw-p 00000000 00:00 0                          [stack]7fff2e999000-7fff2e99b000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted

Something isn't being cleaned up when the pyabpoa.msa_aligner object is garbage-collected, because if I share the same aligner instance across aligner.msa calls, I get no crash:

aligner = pyabpoa.msa_aligner(aln_mode="l")

def poa(seqs, **kwargs):
    #aligner = pyabpoa.msa_aligner(aln_mode="l")
    res = aligner.msa(seqs, out_cons=True, out_msa=False, max_n_cons=1, min_freq=0.25)
    return res.cons_seq

for i in range(1):
    res = poa(bad_seqs)

for i in range(2):
    res = poa(good_seqs)

for i in range(1):
    res = poa(bad_seqs)

The above test scripts define two lists of sequences bad_seqs and good_seqs. Aligning bad_seqs more than once during the lifetime of the Python process, even if these are interspersed with aligning good_seqs (or any other list of sequences), is sufficient to trigger a crash.

Out of ~100k groups of 10-50 sequences (all ~4kb) I've tried aligning (in the context of the sequencing pipeline I'm working on), I've found ~4 sequence groups that trigger a crash. I've only double-checked that I can get a reproducible crash for one of them (bad_seqs), but I can go try to find other examples if that'd help.

Another clue, in case it's useful: while I was coming up with this minimal reproducible example occasionally I would get the output [abpoa_graph_node_id_to_index] Wrong node id: 19464488 before Python crashed. I can't seem to get it to print that any more, not sure why.

abpoa segfaults when building with sse4.1 but not avx2

I have been getting some rare segmentation faults from abpoa. They only reproduce when compiling with sse4.1. I've been using sse4.1 to increase portability as well as save on cloud costs. I suspect maybe my input sizes and score matrices are too big for the registers used?

Here is an example that segfaults with sse4.1 but runs through with avx2 on this architecture:

wget http://public.gi.ucsc.edu/~hickey/debug/abpoa-sse41-segfault.tar.gz
tar zxf abpoa-sse41-segfault.tar.gz
abpoa ./ap_in_140727466179648.fa -O 400,1200 -E 30,1 -b 300 -f 0.025000 -t ./ap_in_140727466179648.fa.mat -r 1 -m 0 -N > ./ap_in_140727466179648.fa.out

Is this a bug in abpoa?

If not, is it possible to know a priori which inputs I can expect to be able to run with which instruction settings? And, ideally, to give an error message?

Thanks for your great library and continued support!

Unstable behavior in local alignment mode on x64 linux, possibly due to uninitialized values

Hi @yangao07 ,

I'm using the latest binary release of abPOA 1.4..1 on x64 linux (Centos 7). I was getting unstable, non-deterministic results, especially seeing the error [simd_abpoa_align_sequence_to_subgraph] Error in cg_backtrack quite often for my use case, and I was able to trace this to an un-initialized memory error which seems to occur in local but not global alignment mode.

I reduced this down to a simple test case, using the following input:

test.fa:

>1
A
>2
A

And running:

valgrind ./abpoa -m1 test.fa

The top of the valgrind results show:

$ valgrind ./abpoa -m1 test.fa
==121699== Memcheck, a memory error detector
==121699== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==121699== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info   
==121699== Command: ./abpoa -m1 test.fa
==121699==
[main] CMD:  ./abpoa -m1 test.fa
==121699== Conditional jump or move depends on uninitialised value(s)
==121699==    at 0x13EC10: simd_abpoa_align_sequence_to_subgraph (in /home/csaunders/devel/github/abPOA/tmp/abPOA-v1.4.1_x64-linux/abpoa)
==121699==    by 0x141AB7: simd_abpoa_align_sequence_to_graph (in /home/csaunders/devel/github/abPOA/tmp/abPOA-v1.4.1_x64-linux/abpoa)
==121699==    by 0x10CE31: abpoa_poa (in /home/csaunders/devel/github/abPOA/tmp/abPOA-v1.4.1_x64-linux/abpoa)
==121699==    by 0x10DD4A: abpoa_msa1 (in /home/csaunders/devel/github/abPOA/tmp/abPOA-v1.4.1_x64-linux/abpoa)
==121699==    by 0x10A694: abpoa_main (in /home/csaunders/devel/github/abPOA/tmp/abPOA-v1.4.1_x64-linux/abpoa)
==121699==    by 0x10A008: main (in /home/csaunders/devel/github/abPOA/tmp/abPOA-v1.4.1_x64-linux/abpoa)

...followed by many similar memory errors that I assume are related. No issues appear for the same test in global mode.

Note also that the same error occurs for code I compiled from the head of the main branch (d2e0186).

Thanks for any help or suggestions you might have to stabilize this case!

Change in abpoa_msa function signature

Hi,

I have abPOA integrated within another tool and I'm using it for consensus calling. I've been using the abpoa_msa function as previously described in the example.c file. With the latest version of abPOA I noticed that the signature for this function has changed, mainly it no longer allows for a consensus to be saved to a string in memory, but now expects only a C FILE pointer to write the output to file or stdout.

Is it possible to bring back or add an option to save the concensus in memory with the latest version?

Thanks

`abPOA ` conda installer broken?

I believe there is something wrong with the abPOA conda installer as the installed version breaks even with the test data (seq.fa).

conda create -n abpoa_env -c bioconda abpoa -y
conda activate abpoa_env
abpoa seq.fa > cons.fa
[main] CMD:  abpoa -r3 test_poa.fa
Illegal instruction (core dumped)

As a workaround, I built abPOA from source and experienced no issues there.

Incremental update of the graph

Hello,

Thank you very much for abPOA, it is a nice tool, installation was fast and easy.

I would like to be able to incrementally update the POA graph. That would be very practical to use the graph as a compressed aligned version of the sequences and to add sequences to it as they accumulate over time.

Typical use case would be to first generate a graph (for instance in gfa format) and then to be able to add sequences to this graph with additional commands. For instance with an --increment option:

abpoa -r 3 seqs.fa > graph.gfa

abpoa --increment newseqs.fa graph.gfa > newgraph.gfa

Best regards,

Hugues

Homopolymer indels not consistently aligned

Hi, I am trying to get a reasonable alignment in a region which has some tandem repeats, flanked by non-repetitive sequence. I can get good (enough) results in the tandem region using these parameters:

abpoa \
-n 10 \
--progressive \
--amb-strand \
-b 1000 \
-r 1 \

However, in the (mostly non-repetitive) flanking region there is a long homopolymer, where I get this result:

TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGTCTGGGCAACATAGTGAGACATTGTCTCTAC------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGTCTGGGCAACATAGTGAGACATTGTCTCTAC------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTACA-------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTACA-------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTACA-------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------AAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------------AAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------------AAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------AAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCAGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCAGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCAGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCAGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------ACAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------ACAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------ACAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC------------------------AAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC------------------------AAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC

Where it seems to arbitrarily assign different paths to the same AC prefix. Do you think this can be resolved with parameter choices or is this an unavoidable aspect of POA?

Thanks

strange Segmentation fault when trying to integrate abpoa_msa subfunctions into C++ smoothxg

Hi @yangao07 ,

I am stuck trying to integrate abPOA into smoothxg https://github.com/subwaystation/smoothxg/blob/78d9117df023d5618a8efdb4041d14438a787193/src/smooth.cpp#L117.

Invoking the abpoa_msa within the project works flawlessly.

However, I started to only use a subset of the abpoa_msa function, as I will need the abpoa_res_t and subsequently the DAG, which I want to convert into our odgi graph data structure. This always resulted in:

0x0000555555768c20 in abpoa_add_subgraph_alignment ()
(gdb) bt
#0  0x0000555555768c20 in abpoa_add_subgraph_alignment ()
#1  0x000055555576905f in abpoa_add_graph_alignment ()
#2  0x0000555555694fc1 in smoothxg::smooth_abPOA (graph=..., block=..., block_id=<optimized out>, consensus_name=...) at /home/heumos/git/smoothxg/src/smooth.cpp:93
#3  0x00005555556955c5 in operator() (__closure=0x555555908f40, block_id=0, tid=<optimized out>) at /home/heumos/git/smoothxg/src/smooth.cpp:321
#4  0x000055555569694f in std::__invoke_impl<void, smoothxg::smooth_and_lace(const xg::XG&, const std::vector<smoothxg::block_t>&, int8_t, int8_t, int8_t, int8_t, int8_t, int8_t, const string&)::<lambda(uint64_t, int)>&, long unsigned int, int> (__f=...) at /usr/include/c++/10.2.0/bits/invoke.h:103
#5  std::__invoke_r<void, smoothxg::smooth_and_lace(const xg::XG&, const std::vector<smoothxg::block_t>&, int8_t, int8_t, int8_t, int8_t, int8_t, int8_t, const string&)::<lambda(uint64_t, int)>&, long unsigned int, int> (__fn=...)
    at /usr/include/c++/10.2.0/bits/invoke.h:110
#6  std::_Function_handler<void(long unsigned int, int), smoothxg::smooth_and_lace(const xg::XG&, const std::vector<smoothxg::block_t>&, int8_t, int8_t, int8_t, int8_t, int8_t, int8_t, const string&)::<lambda(uint64_t, int)> >::_M_invoke(const std::_Any_data &, unsigned long &&, int &&) (__functor=..., __args#0=<optimized out>, __args#1=<optimized out>) at /usr/include/c++/10.2.0/bits/std_function.h:291
#7  0x000055555569b035 in std::function<void (unsigned long, int)>::operator()(unsigned long, int) const (__args#1=<optimized out>, __args#0=<optimized out>, this=<optimized out>) at /usr/include/c++/10.2.0/bits/std_function.h:617
#8  paryfor::parallel_for<unsigned long>(unsigned long const&, unsigned long const&, unsigned long const&, std::function<void (unsigned long, int)> const&)::{lambda(int)#1}::operator()(int) const (thread_id=<optimized out>, 
    this=0x5555558c3fe0) at /home/heumos/git/smoothxg/deps/paryfor/paryfor.hpp:704
#9  std::__invoke_impl<void, paryfor::parallel_for<unsigned long>(unsigned long const&, unsigned long const&, unsigned long const&, std::function<void (unsigned long, int)> const&)::{lambda(int)#1}, unsigned long>(std::__invoke_othe--Type <RET> for more, q to quit, c to continue without paging--
r, paryfor::parallel_for<unsigned long>(unsigned long const&, unsigned long const&, unsigned long const&, std::function<void (unsigned long, int)> const&)::{lambda(int)#1}&&, unsigned long&&) (__f=...)
    at /usr/include/c++/10.2.0/bits/invoke.h:60
#10 std::__invoke<paryfor::parallel_for<unsigned long>(unsigned long const&, unsigned long const&, unsigned long const&, std::function<void (unsigned long, int)> const&)::{lambda(int)#1}, unsigned long>(paryfor::parallel_for<unsigned long>(unsigned long const&, unsigned long const&, unsigned long const&, std::function<void (unsigned long, int)> const&)::{lambda(int)#1}&&, (std::__invoke_result&&)...) (__fn=...) at /usr/include/c++/10.2.0/bits/invoke.h:95
#11 std::thread::_Invoker<std::tuple<paryfor::parallel_for<unsigned long>(unsigned long const&, unsigned long const&, unsigned long const&, std::function<void (unsigned long, int)> const&)::{lambda(int)#1}, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=0x5555558c3fd8) at /usr/include/c++/10.2.0/thread:264
#12 std::thread::_Invoker<std::tuple<paryfor::parallel_for<unsigned long>(unsigned long const&, unsigned long const&, unsigned long const&, std::function<void (unsigned long, int)> const&)::{lambda(int)#1}, unsigned long> >::operato--Type <RET> for more, q to quit, c to continue without paging--
r()() (this=0x5555558c3fd8) at /usr/include/c++/10.2.0/thread:271
#13 std::thread::_State_impl<std::thread::_Invoker<std::tuple<paryfor::parallel_for<unsigned long>(unsigned long const&, unsigned long const&, unsigned long const&, std::function<void (unsigned long, int)> const&)::{lambda(int)#1}, unsigned long> > >::_M_run() (this=0x5555558c3fd0) at /usr/include/c++/10.2.0/thread:215
#14 0x00007ffff7e5bc24 in std::execute_native_thread_routine (__p=0x5555558c3fd0) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
#15 0x00007ffff7bd03e9 in start_thread () from /usr/lib/libpthread.so.0
#16 0x00007ffff7afe293 in clone () from /usr/lib/libc.so.6

Then I just copied the whole abpoa_msa function into smoothxg, but I still got the same error........

So I went back to your example.c and copied abpoa_msa there. This works. But doing this in my C++ code didn't work.
I am really puzzled. Do you have any idea?

Thanks!
Best,
Smon

Can't install versions later than 1.0.2 using pip on Ubuntu 20.04 LTS in Docker

On Ubuntu 20.04 LTS (and on Debian) I cannot freshly (i.e. without earlier versions installed) install the later versions of pyabpoa using pip. The only workaround is to install version 1.0.2 and then update. This makes installing pyabpoa in a docker image extremely hard.

Error message:

#16 1.287 Collecting pyabpoa==1.2.4
#16 1.411 Downloading pyabpoa-1.2.4.tar.gz (138 kB)
#16 1.604 ERROR: Files/directories not found in /tmp/pip-install-6dwget1n/pyabpoa/pip-egg-info

Any idea how to fix this?

Thank You!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.