lh3 / fermi-lite Goto Github PK
View Code? Open in Web Editor NEWStandalone C library for assembling Illumina short reads in small regions
License: MIT License
Standalone C library for assembling Illumina short reads in small regions
License: MIT License
Is there any parameter or easy code changes that could make fermi-lite strand specific ? I am happy to make the changes if you think this is straightforward and had some pointers, I tried removing some of the revcomp lines in the source code but the resulting assembly was worse off.
In my use-case I know all my input sequences are from the same strand therefore I do not want their reverse complement to be tried for overlaps.
Thank you
We are interested in using fermi-lite for local assembly. In the readme you mention the need to automatically determine parameters for the assembly. What parameters need to be determined?
when you're ready. I've been keening for a library like this for a long time. Thanks, Heng!
Hello,
I am using the error correction code of fermi-lite in my thesis and it works pretty well. I have noticed that count k-mer occurrences with help of a table built on top of a set of khash tables (enhanced with locking support). The lowest 14 bits of table keys is used to count occurrences of corresponding k-mers. Bits 0-7 count low quality instances, 8-13 are responsible for high-quality ones.
So, to extract the low-quality occurence, you need just to AND the key with the 0xff mask. To get the high-quality one, a right shift of 8, followed by AND with 0x3f mask, is required.
On line 450 of the bfc.c file (the bfc_ec1dir function), there is probably a wrong mask applied:
pen.absent_high = ((s>>8&0xff) < e->opt->min_cov);
Can you look into it please? I think I have quite deep understanding of the code now but I am still probably missing few details..
We would like to deploy fermi-lite in vg for local assembly and homogenization. Is there any particular consideration that we should take when doing this?
It may be helpful to assemble the data from many genomes in a small region (1kb-100kb for instance). What parameters might we use in that case?
This exception is thrown when short sequences are encountered.
@0:1 4281 . .
CCCACAGAACTAAAACAGAAGAATTCTC
@0:2 4281 . .
CCTAGACAGAACCCATCTAAGAAACGAC
I have seen this on occasion when a read is truncated.
Thanks.
Hi Heng,
In trying to link bwa and fml in the same executable, I ran into an issue where bseq1_t
was defined differently in each library. I ended up making a fork that fixed this issue for SeqLib, but am getting some feedback that it would be better to avoid having multiple fml / bwa clones out there and instead just have SeqLib link to the official fml.
Would you be willing to consider a PR that does the minimal amount of re-naming within fml to be able to link to bwa without multiple definition errors?
Hi,
I'm just opening this issue to link to a pull request which adds an enhancement of libSeqLib. If this patch would be applied libSeqLib could drop the incompatible code copy of fermi-lite.
Kind regards, Andreas.
Currently the code in ksw.c
seems to need SSE2 to compile. It would be nice to have some kind of -- probably slower -- fallback implementation to improve support on architectures without these instructions.
While compiling, I got following error:
gcc -Wall -O2 prog.c -o prog -L /usr/local/bin -lfml -lz -lm -lpthread
cc1: fatal error: prog.c: No such file or directory
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
Fedora 36.
How to solve this?
@lh3 I was playing around with this tool but I couldn't get it to work on a "simple" case. I duplicated a read 100 times and would expect it to output the duplicated read. Any thoughts?
Hi,
When setting aggressive trimming in fermi-lite to pop bubbles in heterozygous regions, what is the strategy being employed.
Is the longer path in the bubble being kept or the shorter path? Or the one with highest average coverage?
I am using fermi-lite to do local assembly of ~2kb regions.
Thanks,
Cristian
Consider the following 8 reads.
>seq1
ATCCTGAGAATCAATCTGTGAAAATTATGTCTTGGGAGGAGGGGAAGGAAACCAAAAATTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTC
>seq2
GGGAAGGAAACCAAAAATTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATCTCACAGAATCGCAAAGGAAGAAAATCAGGGCCTA
>seq3
TTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATATCACAGAATCGCAAAGGAAGAAAATCAGGGCCTACCTATCTAAATTTAAAATT
>seq4
GAAATTTTAAATTTAGATATGTAGGCCCTGATTTTCTTCCTTTGCGATTCTGTGATATTCAAGACCTGCTTCTAGATAGCTAAGAGTTCCAGCTTTTCTA
>seq5
TGAGAAAATTATGTCTTGGGAGGAGGGGAAGGAAACCAAAAATTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATATCACAGAAT
>seq6
TGAAAATTATGTCTTGGGAGGAGGGGAAGGAAACCAAAAATTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATATCACAGAATCG
>seq7
TTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATATCACAGAATCGCAAAGGAAGAAAATCAGGGCCTACATATCTAAATTTAAAA
>seq8
ATAGCTAAGAGTTCCAGCTTTTCTAAAAATTTTTGGTTTCCTTCCCCTCCTCCCAAGACATAATTTTCACAGATTGATTCTCAGGATTGGCAATCATGCA
A quick multiple sequence alignment shows that there is very good consensus among these 8 reads for most of the alignment.
seq1 -------------atcctgagaatcaatctgtgaaaattatgtcttgggaggaggggaag
_R_seq8 tgcatgattgccaatcctgagaatcaatctgtgaaaattatgtcttgggaggaggggaag
seq5 -----------------------------tgagaaaattatgtcttgggaggaggggaag
seq6 -------------------------------tgaaaattatgtcttgggaggaggggaag
seq2 ------------------------------------------------------gggaag
seq3 ------------------------------------------------------------
_R_seq4 ------------------------------------------------------------
seq7 ------------------------------------------------------------
seq1 gaaaccaaaaatttttagaaaagctggaactcttagctatctagaagcaggtc-------
_R_seq8 gaaaccaaaaatttttagaaaagctggaactcttagctat--------------------
seq5 gaaaccaaaaatttttagaaaagctggaactcttagctatctagaagcaggtcttgaata
seq6 gaaaccaaaaatttttagaaaagctggaactcttagctatctagaagcaggtcttgaata
seq2 gaaaccaaaaatttttagaaaagctggaactcttagctatctagaagcaggtcttgaatc
seq3 -------------tttagaaaagctggaactcttagctatctagaagcaggtcttgaata
_R_seq4 ---------------tagaaaagctggaactcttagctatctagaagcaggtcttgaata
seq7 -----------tttttagaaaagctggaactcttagctatctagaagcaggtcttgaata
*************************
seq1 -------------------------------------------------------
_R_seq8 -------------------------------------------------------
seq5 tcacagaat----------------------------------------------
seq6 tcacagaatcg--------------------------------------------
seq2 tcacagaatcgcaaaggaagaaaatcagggccta---------------------
seq3 tcacagaatcgcaaaggaagaaaatcagggcctacctatctaaatttaaaatt--
_R_seq4 tcacagaatcgcaaaggaagaaaatcagggcctacatatctaaatttaaaatttc
seq7 tcacagaatcgcaaaggaagaaaatcagggcctacatatctaaatttaaaa----
However, I cannot get fml-asm
to produce any assembly from these reads. I've tried relaxing parameters in various ways but with no success. Are there any parameter settings that will assemble these reads, or is this a particularly challenging case that can't easily be solved?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.