zyxue / kleat Goto Github PK
View Code? Open in Web Editor NEWCleavage site prediction via de novo assembly
License: MIT License
Cleavage site prediction via de novo assembly
License: MIT License
adf['ctg_hex_dist'] = adf.ctg_hex_pos - adf.clv
adf['ref_hex_dist'] = adf.ref_hex_pos - adf.clv
prefer tail_base as it sounds most relevant to APA
Then use ML to figure out which PAS hexamer is more important
hardclip is distinct from softclip as it's related to chimeric contigs,
see if it's indeed necessary to consider it separately from softclip (likely).
This is because the coordinate information is currently lost when the sequence is extracted from contig.
Currently, this search function is used to search for hexamer in the contig, it only takes into account the extracted sequence with cigar information missing.
Line 66 in 933e240
Maybe it's easier to define the searching window (e.g. 50bp) wst. to the reference, the actual sequence length wst. contig could be a few bp more or less than 50bp.
See BTL-1171
contig: A0.S101817
clv: 204513795
strand: +
ctg_hex: AATAAA
ref_hex: NA
Extracted seqs:
ctg: GAATAAAAGTTGAAGCTGCTGATACTGAACAAACAAGTGAAGAAGTAGGG
ref: ATAAAAGTTGAAGCTGCTGATACTGAACAAACAAGTGAAGAAGTAGGGAA
Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
Currently benchmarking, cluster first, and realized that summing up num_suffix_reads is buggy as one suffix read, based on the current definition, can support multiple neighbouring cleavage sites before clustering.
Potential solution:
Could be more straightforward when reasoning.
main difficulty
File "stringsource", line 2, in pysam.libcalignedsegment.AlignedSegment.__reduce_cython__
TypeError: self._delegate cannot be converted to a Python object for pickling
Given a chimeric contig, when a bridge read is aligned to the contig, the contig is in one piece; but when the contig is aligned to the genome, it becomes two pieces.
Problem: When looping through the contig to the first-piece, how to infer the genome coord of a clv that's actually aligned the second piece?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.