Giter Club home page Giter Club logo

bio's Introduction

Hi there ๐Ÿ‘‹

  • ๐Ÿ”ญ Iโ€™m currently working on becoming my best self
  • ๐ŸŒฑ Iโ€™m currently learning Haskell and Rust
  • ๐Ÿ‘ฏ Iโ€™m looking to collaborate on any Open Source I find interesting
  • ๐Ÿค” Iโ€™m looking for help with my OpenSchool project
  • ๐Ÿ’ฌ Ask me about mechanical keyboards :)
  • ๐Ÿ˜„ Pronouns: He/Him

bio's People

Contributors

correyl avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

correyl

bio's Issues

Add performance tests for the FastaProcessor

Description

Multiple parts of our webapp will have requirements on their performance, such as the FastaProcessor. We need to investigate possible solutions for this.

Accetance Criteria

  • Come up with a way to measure the performance of test code (e.g. ScalaMeter)

Create an initial draft of a FASTA parser

Description

FASTA a might not be a complex format, still it imposes a lot of challenges on asynchronous parsers. Come up with a prototype of an asynchronous parsers that uses Akka streams

Acceptance Criteria

  • It should parse "normal" FASTA files. No special format has to be supported as of now.
  • Good test coverage

Support multiple indeces

Description

We have to support multiple indices to allow users to query different data sets.

Acceptance Criteria

  • The REST endpoints for creating an index should be: POST /index/1

Improve stability of FastaProcessor

Description

While playing with the FileUpload feature it became clear that the FastaProcessor is not stable enough to be used in production (it is just a prototype). Some edge cases that lead to trouble are:

  • Empty FASTA file / bytestring
  • Add more FASTA files to the test to proove against other FASTA styles
  • Improve error handling
  • Allow for comments in the file

Acceptance Criterea

  • Improve stability on all edge cases

Add REST endpoint for index status

Description

It should be very helpful to have an impression about the current of the index (e.g. How many sequences are indexed?)

Acceptance Criteria

  • Add a REST endpoint for the current status of the index

Document the REST API

Description

The webapp should be almost entirely controllable via REST. As of now, there are only a few controllers but as this number increases we have to document them systematically.

Acceptance Criteria

  • Investigate on how to document our REST API (e.g. swagger)
  • Document all present REST controllers with the found solution

Create documentation

Description

Write about:

  • A quick overview:
    • What this project aspires to do?
  • Information about the directory structure:
    • How is the project structured?
  • How to get started developing.
    • How to set it up (for now)?
  • Optional: How to contribute?

Acceptance Criteria

  • Documentation should be added.
  • Optional: Look for a proof-reading tool and apply it.

Document the algorithmic appoarch

Description

We now have a relatively clear picture of how our matching algorithm should work. To really think it through it may be best to document it.

Acceptance Criteria

  • Document our matching algorithm as it is currently planned.

Create a File Upload component

Description

This should serve as a mock for a potential upload of a (FASTA) file that is uploaded so the user can search our index against the contents of his file. Because there is no index as of now, it suffices to show the content of the file to check if our FASTA processor really works/performs.

Acceptance Criterea

When the file is uploaded, all its contents are shown on the page.

Experiment with the search function

Description

Our search function works but we are not 100% sure how. For example fuzzy matching works to a certain extent but we have to be sure to which extent.

Acceptance Criteria

  • Create tests that give an impression of how the search functionality works.
  • If you are certain about a specific behaviour, create a test for it.

Add lucene Sink

Description

To really make us of our super cool FileUpload pipeline, we have to create a sink to pipe all the incoming stuff into, which in the far end is Lucene.
That is why we have to come up with a custom sink for this.

Acceptance Criteria

  • Prototype a Lucene sink

Create a style draft

Description

At some point in time we have to care about styling. This issue should prototype a very basic style to look how we can integrate it into our build process.

Acceptance Criteria

  • All text should be written in a fancy font :)

Switch to compile-time injection

Description

I (personally) doom runtime dependency injection. I want the compiler to check if everything is wired correctly. This ticket should migrate the application from the play-default (runtime) to read-monads (compile-time).

This may not be super urgent but it is easier to switch now than when the application is large.

Acceptance Criteria

  • All things shoud still work
  • Remove guice from the webapp dependencies

Add commandline parser

Description

We need a way of specifying parameters of our project. E.g our configuration can host production and test databases but the database that is actually used at runtime still has to be specified.

Acceptance Criteria

  • Add a commandline parses to the project and evaluate properties like the database to be used.

Migrate to Akka Http

Description

After spending a month on really digging deep into the Polay framework and Akka, I can hardly see an advantage using Play instead of just using Akka Http. A lot of things should get simpler when we switch to Akka Http entirely, such as compile-time dependency.

Acceptance Criteria

  • Migrate every controller to Akka Http

Create webapp subproject

Description

Further down the road I want this project to have a UI. This should be done with a webapp. To clearly separate the logic of the webapp from other parts of the project we have to create a subproject.

Acceptance Criteria

  • A Play subproject shoud be created.
  • You can test the webapp via sbt webapp/run

Separate the index logic into another subproject

Description

The Lucene indexing logic (it's tokenizers and so on) are independent of akka streams and our domain models. Therefore it should be separated into a subproject (named index).

Acceptance Criteria

  • Create a subproject that contains all the Lucene indexing logic

Add abbility to search for the reverse complement

Description

In issue #25 we investigated the possibility to also index the reverse complement. It turned out, that this is the wrong approach. Instead, we should additionally reverse complement the search query and search with both queries against the same index.

Acceptance Criteria

  • Add a checkbox in the UI that togges if the reverse complement should also be searched for.
  • Adjust the REST endpoint for searching accordingly.

Create a search component

Description

The user needs to submit his search queries via UI. We have to build a component for this.

Acceptance Criteria

  • Reintroduce the search bar component

Add Issue templates

Description

It is becoming tedious to write the same scaffold over and over again. GitHub supports issue-templates so we should use it.

Acceptance Criteria

  • Whenever a new issue is created our scaffold should already be filled in.

Add code formatter

Description

Code formatters are a safety net for the programmer in case he (or his IDE) missed something. Also, it presents a common standard for everyone who is contributing.

Acceptance Criteria

  • Add scalariform as an sbt plugin
  • Fix already present errors

Add a Heroku deployment hook

Description

As an initial hoster we can use Heroku. It is not only free but there are also hooks into our CI.
Additionally, Heroku is something nice to learn about.

Acceptance Criteria

  • Upon pushing into master, our app should be deployed to Heroku.

Implement Splitting Strategy

Description

It is very inefficient to return the matched positions in lucene. That's why we have to shrink the size of our documents in the database.

Acceptance Criteria

  • Implement a strategy that possibly splits sequences if they are to long.

Add custom sequence tokenizer

Description

The ultimate goal of the project is to provide a (near) real-time search experience against large sequence data sets such as NCBI. To accomplish this, our indexing process must be a lot smarter. Fortunately, Lucene is extremely customizable.

We should write our own tokenizer which follows best practices of algorithmic/biological pattern matching:

  • Splitting into k-mers
  • Also considering the reverse complement

As always in this day and age of this project, we don't have to be perfect here. It suffices to make it work without extremely high latencies.

Acceptance Criteria

  • Create a custom Lucene tokenizer having the 2 mentioned properties
  • For now, it sufficer to only care about DNA sequences and ignore RNA and Proteins

Migrate to JMH from Scalameter

Description

Focusing on Scalameter was a premature decision. It turned out to be not a mature benchmark framework. I small investigation into JMH turned out to be very succusful. Additionally, JMH seems to be the industry standard.

Acceptance Criteria

  • Migrate benchmarks to JMH
  • Add SBT integration
  • Document how to execute benchmarks

Add Travis CI integration

Description

We want to show the world that we care about stability. A visual proof that our tests are green is a great thing to have.

Acceptance Criteria

  • Integrate Travis CI into the project
  • For now, it suffices to only run Scala tests

Create a React component for the search bar

Description

As a first visual step towards a search engine, a search bar is needed. This issue should also serve a blueprint on now to integrate the Javascript ecosystem into the project.

Acceptance Criteria

  • When loading the website, the user should be greeted with a search bar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.