legumeinfo / azulejo Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 1.69 MB

Tiling phylogenetic space with subtrees

License: BSD 3-Clause "New" or "Revised" License

Python 92.36% Shell 5.36% Makefile 0.74% Perl 1.30% Awk 0.20% CMake 0.05%

azulejo's People

Watchers

Forkers

joelb123

azulejo's Issues

minimap2-based prototype implementation evaluation

I hacked together a method based on the minimap2 alignment strategy. The basic idea right now is that after producing whole genome alignments via minimap, one can use the paftools liftover command to project coordinates of genes from the query genome into coordinates on the reference genome. Then, using an intervaltree-based script, you can produce a correspondence between the original query gene and whatever reference gene best overlaps the projected coordinate (currently using a simple heuristic to sort out cases where multiple genes are overlapped). So far, the method seems to be producing results on cowpea that are fairly consistent with the results produced by the current DAG-chainer based method as well as with the phytozome assignments (exact method they use is unknown, but seems to be based on correspondences to a single reference genome).

A couple of things that seem to me to be possible advantages of the whole genome-based method over the current implementation:

not limited in applicability to protein coding genes; in fact, any annotated elements of a genome can in theory be put into correspondence using this approach
-the liftover produces coordinate mappings regardless of whether the genomes have annotations at regions corresponding, so in theory missed annotations could be recovered more directly this way
the whole genome alignments should be more sensitive than gene-based since differences in introns and intergenic contexts help to inform their alignment decisions
the whole genome alignments contain information that will be useful for other purposes, such as variant calling between aligned genomes (also supported by paftools); a corollary to this is that it's easy to look at the alignments in IGV. Any of the methods can be evaluated reasonably easily using GCV and they would probably also translate fairly naturally into the multi-genome viewer being developed by MGI (I think this is also element-centric though not as tied to protein-coding genes). Perhaps JBrowse2 will also be a context in which such comparisons can be displayed?

My current implementation is pretty simple and has so far only been evaluated on cowpea and glycine, but seems to be doing a competitive job. Worth further discussion with @joelb123 and @cann0010 (who may or may not get this message since the repo still hasn't been moved to legumeinfo organization per #108)

get rid of bash scripts

The azulejo_tool code might be spiffy for bash, but it's still bash. Move to python, which will make it easier to update, use, and test.

Dependabot can't parse your Pipfile

Dependabot couldn't parse the Pipfile found at /Pipfile.

The error Dependabot encountered was:

Dependabot::DependencyFileNotParseable

legumeinfo / azulejo Goto Github PK

azulejo's People

Watchers

Forkers

azulejo's Issues

Recommend Projects

Recommend Topics

Recommend Org