Giter Club home page Giter Club logo

udsm-llps-syn's Introduction

UDSM-LLPS-Syn

In this study, we applied the deep sequence model โ€“ UDSMProt to two new protein classification tasks.

  1. predict proteins with liquid-liquid phase separation propensity
  2. predict synaptic proteins

Our results have shown that, without prior domain knowledge and only based on protein sequences, the fine-tuned language models achieved high classification accuracies and outperformed baseline models using compositional k-mer features in both tasks. For details of this work, please refer to our paper "Deep sequence representation learning for predicting human proteins with liquid-liquid phase separation propensity and synaptic functions" (Wei and Wang, 2022)

Dependencies

Please refer to the orignal repository of UDSMProt for detailed information.

Application Documentation

Users are welcome to use the fine-tuned models in both learning tasks for comparisons in their own research.
Here, we provide one example to show the application of the fine-tuned UDSM-LLPS models in the first learning task. As stated in our paper, in addition to LLPSDB and PhaSepDB data, we also evaluated the performance of UDSM-LLPS on another well-known database โ€“ DrLLPS. DrLLPS is currently the most comprehensive database with the largest collection of LLPS-associated proteins in 164 eukaryotes. In DrLLPS, LLPS-associated proteins can be browsed by three LLPS types, including

  • scaffolds, proteins that can drive or undergo LLPS;
  • clients, proteins that can be recruited by scaffolds for the formation of biomolecular condensates;
  • regulators, proteins that have not been identified to undergo LLPS but shown to be involved in regulating LLPS behaviors.

Description of files

  • DrLLPS data: task_1/application/DrLLPS_data.csv stores 3627 reviewed human LLPS-associated proteins categorized by the three types, consisting of 100 scaffolds, 2,998 clients, and 529 regulators.
  • Fine-tuned UDSM-LLPS models: UDSM-LLPS_Random.pkl and UDSM-LLPS_UniRef.pkl under task_1/
  • Utils file: model_utils.py downloaded from the original UDSMProt repository
  • Token file: tok_itos.npy

Jupyter Notebook Documentation

Please see two Jupyter Notebooks under task_1/application/ for detailed steps:

  • 1. Predict LLPS propensity of DrLLPS data.ipynb
  • 2. UDSM-LLPS prediction results on DrLLPS data.ipynb

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.