peri4n / bio Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 229 KB

A search engine for sequence data

License: Apache License 2.0

Scala 95.05% JavaScript 4.95%

bioinformatics react scala search-engine sequence-alignment

bio's Introduction

Hi there 👋

🔭 I’m currently working on becoming my best self
🌱 I’m currently learning Haskell and Rust
👯 I’m looking to collaborate on any Open Source I find interesting
🤔 I’m looking for help with my OpenSchool project
💬 Ask me about mechanical keyboards :)
😄 Pronouns: He/Him

bio's People

Contributors

Stargazers

Watchers

Forkers

correyl

bio's Issues

Add performance tests for the FastaProcessor

Description

Multiple parts of our webapp will have requirements on their performance, such as the FastaProcessor. We need to investigate possible solutions for this.

Accetance Criteria

Come up with a way to measure the performance of test code (e.g. ScalaMeter)

Create an initial draft of a FASTA parser

Description

FASTA a might not be a complex format, still it imposes a lot of challenges on asynchronous parsers. Come up with a prototype of an asynchronous parsers that uses Akka streams

Acceptance Criteria

It should parse "normal" FASTA files. No special format has to be supported as of now.
Good test coverage

Description

We have to support multiple indices to allow users to query different data sets.

Acceptance Criteria

The REST endpoints for creating an index should be: POST /index/1

Improve stability of FastaProcessor

Description

While playing with the FileUpload feature it became clear that the FastaProcessor is not stable enough to be used in production (it is just a prototype). Some edge cases that lead to trouble are:

Empty FASTA file / bytestring
Add more FASTA files to the test to proove against other FASTA styles
Improve error handling
Allow for comments in the file

Acceptance Criterea

Improve stability on all edge cases

Create REST endpoint to index FASTA files

Description

Create REST endpoint to index FASTA files.

Acceptance Criteria

There should be a REST call for: /index/add

Add REST endpoint for index status

Description

It should be very helpful to have an impression about the current of the index (e.g. How many sequences are indexed?)

Acceptance Criteria

Add a REST endpoint for the current status of the index

Description

The webapp should be almost entirely controllable via REST. As of now, there are only a few controllers but as this number increases we have to document them systematically.

Acceptance Criteria

Investigate on how to document our REST API (e.g. swagger)
Document all present REST controllers with the found solution

Remove necessity to prepend a line delimiter before a fasta flow

Description

The current FastaFlow expects the incoming bytestrings to be lines. This is doomed to be violated some day.

Acceptance Criteria

Calling a FastaFlow should implicitly prepend a line delimiter flow before returning the actual flow.

Description

Write about:

A quick overview:
- What this project aspires to do?
Information about the directory structure:
- How is the project structured?
How to get started developing.
- How to set it up (for now)?
Optional: How to contribute?

Acceptance Criteria

Documentation should be added.
Optional: Look for a proof-reading tool and apply it.

Document the algorithmic appoarch

Description

We now have a relatively clear picture of how our matching algorithm should work. To really think it through it may be best to document it.

Acceptance Criteria

Document our matching algorithm as it is currently planned.

Create a File Upload component

Description

This should serve as a mock for a potential upload of a (FASTA) file that is uploaded so the user can search our index against the contents of his file. Because there is no index as of now, it suffices to show the content of the file to check if our FASTA processor really works/performs.

Acceptance Criterea

When the file is uploaded, all its contents are shown on the page.

Experiment with the search function

Description

Our search function works but we are not 100% sure how. For example fuzzy matching works to a certain extent but we have to be sure to which extent.

Acceptance Criteria

Create tests that give an impression of how the search functionality works.
If you are certain about a specific behaviour, create a test for it.

Add Configuration Library

Description

The application starts to have some parameters so we should investigate how to do this is properly in Scala. A good start might be: https://github.com/lightbend/config

Acceptance Criteria

Make the K-mer size configurable.

Add lucene Sink

Description

To really make us of our super cool FileUpload pipeline, we have to create a sink to pipe all the incoming stuff into, which in the far end is Lucene.
That is why we have to come up with a custom sink for this.

Acceptance Criteria

Prototype a Lucene sink

Create a style draft

Description

At some point in time we have to care about styling. This issue should prototype a very basic style to look how we can integrate it into our build process.

Acceptance Criteria

All text should be written in a fancy font :)

Switch to compile-time injection

Description

I (personally) doom runtime dependency injection. I want the compiler to check if everything is wired correctly. This ticket should migrate the application from the play-default (runtime) to read-monads (compile-time).

This may not be super urgent but it is easier to switch now than when the application is large.

Acceptance Criteria

All things shoud still work
Remove guice from the webapp dependencies

Add commandline parser

Description

We need a way of specifying parameters of our project. E.g our configuration can host production and test databases but the database that is actually used at runtime still has to be specified.

Acceptance Criteria

Add a commandline parses to the project and evaluate properties like the database to be used.

Migrate to Akka Http

Description

After spending a month on really digging deep into the Polay framework and Akka, I can hardly see an advantage using Play instead of just using Akka Http. A lot of things should get simpler when we switch to Akka Http entirely, such as compile-time dependency.

Acceptance Criteria

Migrate every controller to Akka Http

Create webapp subproject

Description

Further down the road I want this project to have a UI. This should be done with a webapp. To clearly separate the logic of the webapp from other parts of the project we have to create a subproject.

Acceptance Criteria

A Play subproject shoud be created.
You can test the webapp via sbt webapp/run

Separate the index logic into another subproject

Description

The Lucene indexing logic (it's tokenizers and so on) are independent of akka streams and our domain models. Therefore it should be separated into a subproject (named index).

Acceptance Criteria

Create a subproject that contains all the Lucene indexing logic

Add abbility to search for the reverse complement

Description

In issue #25 we investigated the possibility to also index the reverse complement. It turned out, that this is the wrong approach. Instead, we should additionally reverse complement the search query and search with both queries against the same index.

Acceptance Criteria

Add a checkbox in the UI that togges if the reverse complement should also be searched for.
Adjust the REST endpoint for searching accordingly.

Add test coverage CI analysis

Description

It is to easy to loose track of your test coverage. That's why we have to setup a pipeline for this.

Acceptance Criteria

Integrate with http://coveralls.io

Create a search component

Description

The user needs to submit his search queries via UI. We have to build a component for this.

Acceptance Criteria

Reintroduce the search bar component

Add Issue templates

Description

It is becoming tedious to write the same scaffold over and over again. GitHub supports issue-templates so we should use it.

Acceptance Criteria

Whenever a new issue is created our scaffold should already be filled in.

Add code formatter

Description

Code formatters are a safety net for the programmer in case he (or his IDE) missed something. Also, it presents a common standard for everyone who is contributing.

Acceptance Criteria

Add scalariform as an sbt plugin
Fix already present errors

Add a Heroku deployment hook

Description

As an initial hoster we can use Heroku. It is not only free but there are also hooks into our CI.
Additionally, Heroku is something nice to learn about.

Acceptance Criteria

Upon pushing into master, our app should be deployed to Heroku.

Implement Splitting Strategy

Description

It is very inefficient to return the matched positions in lucene. That's why we have to shrink the size of our documents in the database.

Acceptance Criteria

Implement a strategy that possibly splits sequences if they are to long.

Add custom sequence tokenizer

Description

The ultimate goal of the project is to provide a (near) real-time search experience against large sequence data sets such as NCBI. To accomplish this, our indexing process must be a lot smarter. Fortunately, Lucene is extremely customizable.

We should write our own tokenizer which follows best practices of algorithmic/biological pattern matching:

Splitting into k-mers
Also considering the reverse complement

As always in this day and age of this project, we don't have to be perfect here. It suffices to make it work without extremely high latencies.

Acceptance Criteria

Create a custom Lucene tokenizer having the 2 mentioned properties
For now, it sufficer to only care about DNA sequences and ignore RNA and Proteins

Migrate to JMH from Scalameter

Description

Focusing on Scalameter was a premature decision. It turned out to be not a mature benchmark framework. I small investigation into JMH turned out to be very succusful. Additionally, JMH seems to be the industry standard.

Acceptance Criteria

Migrate benchmarks to JMH
Add SBT integration
Document how to execute benchmarks

Add Travis CI integration

Description

We want to show the world that we care about stability. A visual proof that our tests are green is a great thing to have.

Acceptance Criteria

Integrate Travis CI into the project
For now, it suffices to only run Scala tests

Create a React component for the search bar

Description

As a first visual step towards a search engine, a search bar is needed. This issue should also serve a blueprint on now to integrate the Javascript ecosystem into the project.

Acceptance Criteria

When loading the website, the user should be greeted with a search bar.

Create REST endpoint to search for sequences in the index

Description

Create REST endpoint to search for sequences in the index.

Acceptance Criteria

There should be a REST call for: /index/search

Fix failing test

Description

https://travis-ci.org/peri4n/bIO

Acceptance Criteria

Should be green again.

peri4n / bio Goto Github PK

bio's Introduction

Hi there 👋

bio's People

Contributors

Stargazers

Watchers

Forkers

bio's Issues

Description

Accetance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criterea

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criterea

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Description

Acceptance Criteria

Recommend Projects

Recommend Topics

Recommend Org