Giter Club home page Giter Club logo

pp2's People

Contributors

mumichae avatar nagam11 avatar octopuscat88 avatar waguramu avatar

Stargazers

 avatar

Watchers

 avatar  avatar

pp2's Issues

Priorities

  • presentation is the most important!
  • compare databases (does redundancy make a difference)
    • neural networks
    • neural translators
  • change FFNN architecture
  • UMAP embedding
    • architecture for 2D vectors
  • bootstrapping + standard error
  • add previous efforts and fails
  • refine plotting

Clean repo

As Marla has mentioned, the repo is quite messy and it can become quite confusing. I have created a branch called clean_repo in which I remove all the unnecessary directories and files.

  • remove biovec
  • merge output and results (do we want to keep the results or are we creating new ones?)
  • remove redundant scripts

Paper-worthy model

  1. new PPI data set
    • PiSITE (write a pisite2fasta parser)
    • constraints on PPI (info for stratification, only for simple FFNN)
  2. embedding/dimensionality reduction
  3. model architecture
    • neural translator (LSTM + attention)
    • ResNet (residual network)
    • LSTM (directly from sequence), GRU (might be tricky)
    • attention learning (remembers more of the input)

Models

  • simple FFNN + Word2Vec
  • simple FFNN + UMAP
  • neural translator + no extra embedding (default)

all models run on small and big datasets

Visualisation

How to visualise the training and predictions. This will be included in the presentation.

Training (Validation set)

  • accuracy
  • loss

Prediction Performance (Test set)

  • accuracy, F1, Q2, ...
  • ROC + AUC
  • AUPR + AUPR
  • Confusion matrix

all as barplots with standard errors

Create a new pipeline for PPI

Create a new pipeline using your own architecture for protein prediction. Use word-embedding.

  • new architecture

  • new ML library

  • parameters for word2vec

  • different classifier

  • new validation

  • new plotting

Optimize Vector Sizes

Task

  • provide and choose several (N) vector sizes
  • extract features via biovec for each of the N vector sizes
  • train model with cross-validation
  • plot different metrics
  • analyse and explain results

Plotting

  • define input format
  • training vs. validation metrics
  • validation metrics across different vector sizes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.