Giter Club home page Giter Club logo

awesome-large-biology-models's Introduction

Awesome Large Biology Models Awesome


Table of Contents


Awesome Papers and Codes

DNA Models

Title Venue Date Code Notes
Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation BioArxiv 2023-09-01 - -
EpiGePT: a Pretrained Transformer model for epigenomics BioArxiv 2023-07-15 Website -
DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks Arxiv 2023-07-11 Github -
Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution Arxiv 2023-06-27 Github Blog
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome Arxiv 2023-06-26 Github GUE
The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics BioRxiv 2023-01-11 Github -
GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics BioRxiv 2022-10-11 - -
Effective gene expression prediction from sequence by integrating long-range interactions Nature Methods 2021-10-04 Github -
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome Bioinformatics 2021-02-04 Github -

RNA Models

Title Venue Date Code Notes
An RNA foundation model enables discovery of disease mechanisms and candidate therapeutics BioRxiv 2023-09-26 - -
CodonBERT: Large Language Models for mRNA Design and Optimization Arxiv 2023-09-09 - -
UNI-RNA: UNIVERSAL PRE-TRAINED MODELS REVOLUTIONIZE RNA RESEARCH BioArxiv 2023-07-11 - -
Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction Arxiv 2023-01-31 Github -

Protein Models

Title Venue Date Code Notes
Genome-wide prediction of disease variant effects with a deep protein language model Nature Genetics 2023-08-10 Github Sequence
De novo design of protein structure and function with RFdiffusion Nature 2023-07-11 Github Structure
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein Arxiv 2022-07-05 - Sequence
OntoProtein: Protein Pretraining With Gene Ontology Embedding ICLR22 2022-06-30 Github Sequence
Accurate proteome-wide missense variant effect prediction with AlphaMissense Science 2023-09-19 Github Sequence
Evolutionary-scale prediction of atomic-level protein structure with a language model Science 2023-03-23 Github Sequence
Protein complex prediction with AlphaFold-Multimer BioArxiv 2022-03-10 Github Sequence
ProtGPT2 is a deep unsupervised language model for protein design Nature Communications 2022-07-27 - Sequence
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-10-01 Github Sequence

Single-cell Models

Title Venue Date Code Notes
GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model BioArxiv 2023-09-26 Github -
scgpt: Towards building a foundation model for single-cell multi-omics using generative ai Arxiv 2023-04-30 Github -
scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data Nature Machine Intelligence 2023-09-26 Github -

Biological Text Models

Title Venue Date Code Notes
GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information Preprint 2023-05-16 Github -
BioGPT: generative pre-trained transformer for biomedical text generation and mining Briefings in Bioinformatics 2022-12-24 Github -

Awesome Datasets

General Datasets

Name Paper Data type Link Notes
NCBI Database - DNA/RNA/Protein Link -
Ensembl Database - DNA/RNA/Protein Link -
UCSC Database - DNA/RNA Link -
RCSB PDB Database - Protein Link -
Uniprot Database - Protein Link -
STRING Database - Protein Link -

DNA Datasets

Name Paper Data type Link Notes
Genome Understanding Evaluation (GUE) DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome Sequence Classification Link -

RNA Datasets

Name Paper Data type Link Notes
Genotype-Tissue Expression (GTEx) The GTEx Consortium atlas of genetic regulatory effects across human tissues General Link -
(Master database of All possible RNA sequences (MARS) The Master Database of All Possible RNA Sequences and Its Integration with RNAcmap for RNA Homology Search General Link -

Protein Datasets

Name Paper Data type Link Notes
TBD

Single-cell Datasets

Name Paper Data type Link Notes
TBD

Biological text Datasets

Name Paper Data type Link Notes
TBD

awesome-large-biology-models's People

Contributors

dwanzhang-ai avatar jasonlinjc avatar mumuyang666 avatar yunlong10 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.