Light

martinjvickers / kast Goto Github PK

Perform Alignment-free k-tuple frequency comparisons from sequences. This can be in the form of two input files (e.g. a reference and a query) or a single file for pairwise comparisons to be made.

License: Other

CMake 0.76% C++ 79.01% R 0.66% Python 14.46% Shell 5.12%

kast's People

Contributors

Stargazers

Watchers

Forkers

emmadebayos

kast's Issues

Issue with markov order

With m=1, d2s, d2star (and d2) all give the exact same as Martins python code.

With m=2, the numbers diverge. D2s is a smaller divergence, but with d2star we get a negative number which should never happen.

Clean up old files that are no longer needed

Add check for zero sized record in qry/ref mode

Create a supported singularity image

Write recipe
Create squashfs image
Upload to singularityHub

Limit the number of kmers used in a variety of cases

Ideas?

markov ordered limited to 3
markov based kmer count to say 21 (e.g. (d2s and d2star))
no limit on count based kmer counting (e.g. d2, manhattan etc.)

Add header files for distances and utils

Create a supported Docker image

App tests AminoAcid

Create some known file output tests to use for complete unit tests.

Detect output file and overwrite

Get travis working

Maybe something like this to run the tests on various distros?

https://gist.github.com/jamesarosen/e29076bd81a099f0f72e

And for centos we could use docker on the ubuntu travis vm.

Write test files for additional file formats

SAM/BAM embl, gbk etc.

App tests for DNA5

Create some known file output tests to use for complete unit tests.

Run travis install on several ubuntu and centos versions

Rename from alfsc_rewrite to alfsc

Create Galaxy wrapper in a repository

Create wrapper
Publish into a galaxy toolshed

Comment functions in utils, distance and seq

dai and hao returning different result from d2tools

Investigate issue with d2s-opt

d2s-opt gave “nan” for everything.

Find ngd and bc result example for test as both returning same result

investigate potential DAI issue

The dai metric gives 0 for all tests, so something looks to be up with that one.

Issue with -ve score returned with d2 distance

KAST gives a d2 score of -1.09891 when comparing a large (5000bp) fragment.

Use RAM to store pairwise comparisons. (TBH, why would you want to do a PW comparison on something THAT big, and if it is that big and you do it on disk, then it would take forever despite using no RAM).

The main thing to do here is ensure we check the amount of RAM FIRST and then fail fast with a warning.

Add functioning unit test for masked count

The current test confirms that it runs without error, but doesn't check the result.

Deal with UI

e.g. making it so that the correct options are available to user with the correct data. So now we've included protein data as an input (which I think is great) I do think we need to decide which options (e.g. reverse compliment, distance measures, kmer sizes etc) are available to the user. E.g. I could easily auto detect if we have nucleotide or protein data and then give a warning about the options for example.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.