Giter Club home page Giter Club logo

efold_data's Introduction

eFold training data

Here's a breakdown of the data we used to train eFold. You can find these datasets on Hugging Face. The raw data in on this google drive.

Training stage Name on HuggingFace Source Method Number of sequences Families length [10, 199] length [200, 499] length [500, 999] length [1000, 1999] length [2000, inf]
Pre-training rnacentral_synthetic Sequences from RNA central RNAstructure 226'729 All known families 176'486 49'463 780 0 0
Pre-training ribo500-blast Ribonanza Competition RNAstructure + DMS and/or SHAPE 46'060 Unlabelled 46'049 11 0 0 0
Pre-training bpRNA-1m bpRNA-1m Covariance analysis 66'715 Unlabelled, sRNA, tRNA 48'090 6'167 2'829 9'260 369
Fine-tuning pri_miRNA This work RNAstructure + DMS 1'098 pri-miRNA 0 1'098 0 0 0
Fine-tuning human_mRNA This work RNAstructure + DMS 1'456 mRNA 0 493 882 81 0
Testing PDB PDB NMR, crystallography 355 Short non-coding RNA 343 6 6 0 0
Testing viral_fragments Peer-reviewed literature RNAstructure + DMS 40 Viral RNA 12 17 11 0 0
Testing lncRNA Bugnon and al, 2022 RNAstructure + DMS 30 Long non-coding RNA 0 2 1 27 0
Testing archiveII Archive II Covariance analysis 3'370 rRNA, tRNA, tmRNA, unlabelled 2'004 1'276 79 11 0

Notes

The normalisation, filtering and replicates matching of SEISMIC data is made using seismic__postprocessing.py.

efold_data's People

Contributors

taillades avatar albericdelajarte avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.