Giter Club home page Giter Club logo

viennarna.jl's Introduction

Julia interface to the ViennaRNA library

Build Status Aqua QA

Unofficial Julia interface to the ViennaRNA library for RNA secondary structure prediction and analysis. Please cite the original ViennaRNA publications if you use this library.

Installation

Install ViennaRNA from the Julia package REPL, which can be accessed by pressing ] from the Julia REPL:

add ViennaRNA

Usage

using ViennaRNA, Unitful

The Unitful library is needed to be able to specify units with @u_str, e.g. 4u"kcal/mol" or 37u"°C". You can get the degree symbol ° by typing \degree and pressing the TAB key in the REPL or in an editor with Julia syntax support.

The original C API functions can be found in the submodule ViennaRNA.LibRNA. Most functions can be called with a String containing the RNA sequence instead of a FoldCompound, e.g. mfe("GGGAAACCC").

FoldCompound

A FoldCompound encapsulates nucleic acid strands and model details, such as energy parameters, temperature, etc.

fc = FoldCompound("GGGGGAAAAACCCCCC";
                  options=[:mfe, :pf],
                  temperature=37u"°C",
                  uniq_ML=true,
                  circular=false)

Important keyword arguments

  • options is a subset of [:default, :eval_only, :hybrid, :mfe, :pf, :window].

  • temperature is used to rescale the free energies with the formula ΔG = ΔH - TΔS (the energy parameter sets contain enthalpy and entropy contributions). The default is 37u"°C"

Model details (additional keyword arguments):

  • circular: determines if the RNA strand is circular, i.e. the 5'-end and 3'-end are covalently bonded. Default is false.
  • dangles: how to treat dangling base pairs in multiloops and the exterior loop. Can be 0, 1, 2, or 3. See ViennaRNA docs for details. Default is 2.
  • gquadruplex: allow G-quadruplexes in predictions. Default is false.
  • log_ML: use logarithmic energy model for multiloops. Default is false.
  • max_bp_span: maximum number of bases over which a basepair can span. Default value is -1 (which means unlimited).
  • min_loop_length: the minimum size of a loop (without the closing base pair). Default is 3.
  • no_GU_basepairs: disallow G-U basepairs. Default is false.
  • no_GU_closure: disallow G-U basepairs as closing pairs for loops. Default is false.
  • no_lonely_pairs: disallow isolated base pairs. Default is false.
  • special_hairpins: use special hairpin energies for certain tri-, tetra- and hexloops. Default is true.
  • uniq_ML: use unique decomposition for multiloops, needed for sample_structures and subopt. Default is false.
  • window_size: window size to be used for local calculations performed in a window moving over the sequence. This value is ignored unless the :window option is set in the FoldCompound options. The default value for window_size is -1.

Changing the energy parameter set

ViennaRNA stores energy parameters in global variables after loading them from a file. Each time a new FoldCompound is created, the parameters are copied from the global variables and saved inside the FoldCompound.

The global variables storing energy parameters can be changed by calling a function specific to each parameter set, or via a Symbol with ViennaRNA.params_load(:RNA_Turner1999). Subsequent calls to FoldCompound will use the new parameters and store a copy of the parameters in the newly created FoldCompound.

The default energy set loaded on startup is :RNA_Turner2004.

ViennaRNA.params_load_defaults()  # default is :RNA_Turner2004
ViennaRNA.params_load_DNA_Mathews1999()
ViennaRNA.params_load_DNA_Mathews2004()
ViennaRNA.params_load_RNA_Andronescu2007()
ViennaRNA.params_load_RNA_Langdon2018()
ViennaRNA.params_load_RNA_Turner1999()
ViennaRNA.params_load_RNA_Turner2004()
ViennaRNA.params_load_RNA_misc_special_hairpins()

ViennaRNA.params_load(:DNA_Mathews2004)
# Options are
#   :DNA_Mathews1999, :DNA_Mathews2004,
#   :RNA_Andronescu2007, :RNA_Langdon2018,
#   :RNA_Turner1999, :RNA_Turner2004

Multiple strands

Mutiple strands can be given by separating them with an &, e.g. FoldCompound("GGGG&CCCC").

Comparative folding with an MSA (alifold)

Pass multiple sequences to the FoldCompound constructor for comparative mode (alifold): FoldCompound(["GG-GAAAACCCC", "GCCGAAA-CGGC"]).

It is currently not possible to have multiple strands in alifold mode.

Minimum free energy structure (MFE)

# please excuse the excess precision printed when displaying -9.4 kcal/mol
mfe(fc)  # => ("(((((.....))))).", -9.399999618530273 kcal mol^-1)

Partition function

partfn(fc)  # => ("(((((.....})))),", -9.81672180213034 kcal mol^-1)

Free energy change of folding into a structure

energy(fc, "((((.......)))).")  # => -6.199999809265137 kcal mol^-1

Basepair probabilities

bpp(fc)  # => 16×16 Matrix{Float64}

Boltzmann probability of a structure

prob_of_structure(fc, "(((((.....))))).")  # => 0.5085737925408758

Ensemble defect

ensemble_defect(fc, "(((((.....))))).")  # => 0.33085374128228884

Sample structures (probabilistic / stochastic backtrack)

Sample from Boltzmann ensemble of secondary structures

sample_structures(fc)                         # => [ "((((......)))).." ]
sample_structures(fc; options=:nonredundant,
                      num_samples=20)         # => 20-element Vector{String}

Suboptimal structures

All suboptimal structures with energies delta above the MFE structure

subopt(fc; delta=4u"kcal/mol")  # => Vector{Tuple{String, Quantity}}

Suboptimal structures with the method of Zuker

subopt_zuker(fc)  # => Vector{Tuple{String, Quantity}}

Sliding window prediction of MFE

mfe_window saves all the results in a Vector.

seq = "G"^50 * "A"^4 * "C"^50
mfe_window(seq; window_size=30)
fc = FoldCompound(seq; options=[:default, :window], window_size=30)
mfe_window(fc)  # => Vector{ResultWindowMFE}

mfe_window_channel returns a Channel that can be used to iteratively process the results.

seq = "G"^50 * "A"^4 * "C"^50
chan = mfe_window_channel(seq; window_size=30)
take!(chan)
fc = FoldCompound(seq; options=[:default, :window], window_size=30)
chan = mfe_window_channel(fc)
take!(chan)

Neighboring structures

Move set to reach neighboring structures of a given structure

neighbors(fc, Pairtable("((.....))"))  # => Vector{Vector{Tuple{Int,Int}}}

Basepair distance between secondary structures

bp_distance("....", "(())")  # => 2

Tree edit distance between secondary structures

tree_edit_dist("(..)", "....")  # => 4.0f0

Mean basepair distance

Mean basepair distance of all structures to each other, weighted by the structure's Boltzmann probabilities

mean_bp_distance(fc)  # => 5.266430215905888

Centroid structure

Centroid structure of ensemble: structure with smallest sum of base-pair distances weighted by Boltzmann probabilities:

centroid(fc)  # => ("(((((.....))))).", 4.799131457924728)

Maximum expected accuracy (MEA) structure

The gamma parameter trades off specificity (low gamma) and sensitivity (high gamma).

mea(fc; gamma=1.0)  # => ("(((((.....))))).", 10.706348f0)

Heat capacity calculation

# starting temperature, end temperature, temperature increment
heat_capacity(fc, 10u"°C", 60u"°C")  # => Vector{Tuple{Quantity,Quantity}}
heat_capacity(fc, 10u"°C", 60u"°C", 1u"°C"; mpoints=5)

Plotting structures

# plot coordinates of a secondary structure, returns two arrays with
# x and y coordinates
plot_coords("(((...)))")  # => Tuple{Float32[], Float32[]}

See PlotRNA.jl for more secondary structure plotting functionality.

Inverse folding / sequence design

inverse_fold("AAAAAAA", "((...))")     a# => ("GCAAAGC", 2.0f0)
inverse_pf_fold("AAAAAAA", "((...))")  # => ("GCCAAGC", 2.0244526863098145 kcal mol^-1)

Seeding the random number generator

ViennaRNA.init_rand_seed(42)

Modified bases energy parameter presets

Energy parameters for modified bases can be used via ViennaRNA's soft constraints mechanism.

using ViennaRNA
fc = FoldCompound("AAACCCUUU")
partfn(fc)  # -0.0025467473022687203 kcal mol^-1
ViennaRNA.sc_mod_pseudouridine!(fc, [7,8,9])  # modify positions 7, 8, 9
partfn(fc)  # -0.004713416050703315 kcal mol^-1

These functions are currently available:

sc_mod_7DA!
sc_mod_dihydrouridine!
sc_mod_inosine!
sc_mod_m6A!
sc_mod_pseudouridine!
sc_mod_purine!

Please refer to the ViennaRNA section on modified bases for more details.

Reducing memory usage

When creating many FoldCompounds, running finalize manually will avoid excessive memory buildup.

for i = 1:100_000
    fc = FoldCompound("ACGU")
    # do something with fc
    finalize(fc)
end

Related Julia packages

viennarna.jl's People

Contributors

cossio avatar marcom avatar timoleistner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

timoleistner

viennarna.jl's Issues

ViennaRNA C API: docstrings have rendering errors in markdown

Example:

?ViennaRNA.LibRNA.vrna_plot_coords

The docstring is rendered with Markdown, which breaks the formatting, as the ViennaRNA C library doesn't use markdown, e.g. VRNA_PLOT_TYPE_SIMPLE is rendered as VRNAPLOTTYPE_SIMPLE.

Solution:

When generating the bindings with Clang.jl (in gen/generator.jl), either the docstrings should be surrounded by triple backticks (but backticks would still need to be escaped with a backslash inside the string) or somehow else be made 'markdown-safe'.

Fails to precompile on CI

I got a project depending on ViennaRNA.jl which fails to precompile (but only in CI jobs) with the following error

┌ Error: /root/.julia/artifacts/725e603448ac6c44bfbbf2df16f3225673d1ff38/include/ViennaRNA/utils/structures.h:140:10: fatal error: 'stdio.h' file not found
└ @ CBinding ~/.julia/packages/CBinding/kBUap/src/context.jl:377
ERROR: LoadError: Errors parsing C code
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] parse!(ctx::CBinding.Context{:c})
    @ CBinding ~/.julia/packages/CBinding/kBUap/src/context.jl:388
  [3] clang_str(mod::Module, loc::LineNumberNode, lang::Symbol, str::String, opts::String)
    @ CBinding ~/.julia/packages/CBinding/kBUap/src/context.jl:481
  [4] var"@c_str"(__source__::LineNumberNode, __module__::Module, exprs::Vararg{Any})
    @ CBinding ~/.julia/packages/CBinding/kBUap/src/context_c.jl:5
  [5] include
    @ ./Base.jl:418 [inlined]
  [6] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::String)
    @ Base ./loading.jl:1318
  [7] top-level scope
    @ none:1
  [8] eval
    @ ./boot.jl:373 [inlined]
  [9] eval(x::Expr)
    @ Base.MainInclude ./client.jl:453
 [10] top-level scope
    @ none:1
in expression starting at /root/.julia/packages/ViennaRNA/7mJxT/src/ViennaRNA.jl:21
in expression starting at /root/.julia/packages/ViennaRNA/7mJxT/src/ViennaRNA.jl:1

So it seems this is not really selfcontained and the container is missing some C dependencies, if I understand correctly

plot_structure_makie is very slow on the first run

See discussion here: #4

What i tried:

Adding the following to plot_structure.jl:
(adapted from https://github.com/JuliaPlots/Makie.jl/blob/master/src/precompiles.jl )

function _precompile_()
    ccall(:jl_generating_output, Cint, ()) == 1 || return nothing
    plot_structure_makie("(((...)))")
    plot_structure_makie("(((...)))"; sequence="GGGAAACCC")
    return
end
_precompile_()

Unfortunately the first plot seems to take even longer with this precompile approach (53s vs 45s without).

As mentioned in #4 using the GLMakie & WGLMakie backends also didn't yield an improvment.

Segfault for `plot_coords("()")`

Segfault:

using ViennaRNA
plot_coords("()")  # sometimes doesn't segfault but keeps on going...
plot_coords(Pairtable("()")) # also segfaults
plot_coords("((()))") # also crashes, but in a different way

These work:

using ViennaRNA
plot_coords("")
plot_coords("(.)")
plot_coords("(..)")
plot_coords("(((.)))")
plot_coords("(((..)))")

It crashes on the line

n = _vrna_plot_coords(structure, ptr_cx, ptr_cy, type)

Which in turn calls

LibRNA.vrna_plot_coords(structure, cx, cy, type)

Tests for different plot_type

Works

plot_coords("()"; plot_type=:simple)    # type = 0
plot_coords("()"; plot_type=:naview)    # type = 1
plot_coords("()"; plot_type=:circular)    # type = 2

Crashes (sometimes also seems to just hang)

plot_coords("()"; plot_type=:turtle)    # type = 3
plot_coords("()"; plot_type=:puzzler)    # type = 4

Feature request: Alifold

Hi again,

Are there any plans to implement a wrapper for alifold?
As I'm not familiar with C, I dont really know where to start but I would be thankful if you could lead me to the right direction.
My current try looks like this

function alifold(sequences::Vector{AbstractString}, structure::AbstractString)
    alifold_output = LibRNA.vrna_alifold(sequences, structure)
    return alifold_output
end

Is the following needed somehow to

cstr_structure = Ptr{Cchar}(LibRNA.vrna_alloc(length(structure) + 1))
cstr_sequences = Ptr{Ptr{Cchar}}(LibRNA.vrna_alloc(sizeof(Ptr{Cchar})))

I'm almost certain this is not correct though.
Do I also need to write the wrap_alifold function present in alifold.c?
Do I need a specific struct to return the alifold information?
Would be nice to have this implemented and I would like to help if I can.

Greetings,
Timo

Graphic dependencies are quite heavy

I don't know how essential the visualization functions are for users of this package, but loading the whole Makie stack regardless if you need that or not is quite expensive. For comparison (timings in a temporary environment after precompilation):
[email protected]:

julia> @time using ViennaRNA
  0.185636 seconds (747.30 k allocations: 48.920 MiB, 5.63% gc time, 6.60% compilation time)

[email protected] (includes Luxor):

julia> @time using ViennaRNA
  0.700532 seconds (2.55 M allocations: 171.304 MiB, 5.47% gc time, 6.65% compilation time)

[email protected] (includes CairoMakie):

julia> @time using ViennaRNA
  4.242130 seconds (14.88 M allocations: 1010.827 MiB, 9.37% gc time, 21.10% compilation time)

I'm happy to spend an Plots.jl recipe, if thats of interest until Makie has lightweight recipes itself, otherwise I'd prefer the visualization stuff to live in a separate package, such that you can use the other functionality without pulling a large amount of dependencies in.
Another alternative would be to use Requires.jl.

Basepair probability vector and plotting RNA structures

Would be nice to have a function which returns the basepair probabilities as a vector so it can be used later to label visualizations with the right colors.
It seems like the RNAFold webserver does this with a perl script which isnt particularly easy to read.
This is the function I came up with:

function bpp_vec(bppm::Matrix{Float64}, dotbracket::String)
    double_bppm = transpose(bppm) + bppm
    prob_vec = Vector{Float64}(undef, first(size(double_bppm)))
    for (i, row) in enumerate(eachrow(double_bppm))
        if dotbracket[i] in ('(', ')')
            prob_vec[i] = maximum(row)
        else
            prob_vec[i] = 1 - maximum(row)
        end
    end
    return prob_vec
end

I'm not 100% sure this is the correct way of extracting this information though.
I've also seen that you still have this in your TODO list so I'm not sure if or how this function would work after this change.

Thanks alot for porting the ViennaRNA package to julia btw. 🙏. Very nicely done!

Windowed predictions

Following discussion in #7 (comment)

For sliding window predictions on a sequence that produce an array of results instead of just one (one result for each window).

To use windowed predictions we have to do the following:

  • call vrna_fold_compound() with option VRNA_OPTION_WINDOW
  • set the window size set with vrna_md_t.window_size
  • use the functions vrna_mfe_window_cb, vrna_mfe_window_zscore_cb, vrna_probs_window

A Julia interface could look like this:

  • mfe_window(cb::Function, fc::FoldCompound, ...)
  • mfe_window(fc::FoldCompound, ...)
  • probs_window(cb::Function, fc::FoldCompound, ...)
  • probs_window(fc::FoldCompound, ...)

With the callback function in the first position so we can use do-notation to pass custom callback functions.
The second version of each function without the callback just stores all the results in an array and returns that (by calling the first version with a callback that pushes each result to an array).

This style of callback function would also be something useful for subopt, pbacktrack, and maybe other functions where ViennaRNA accepts callback functions.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

DNA parameters

Love the interface of ViennaRNA for Julia. I've just installed it and I am still testing all the functions.
For now I seem unable to run FoldCompound using either of the DNA parameters. All RNA parameters work.
I've gone through the code to see if I could pick any obvious glitches but I cannot find any faults there. I looked through the ViennaRNA package and both are there so I don't fully understand why I am being pushed onto "unknown energy parameters" error.

Missing functionality from ViennaRNA

Some of these items might already work, but should still get their own testset in the tests/ dir.

ViennaRNA-2.5.0 and before

  • default temperature for DNA params is 37 °C (like for RNA), change to better default

Callback functions

  • subopt
  • pbacktrack
  • neighbours
  • more?

Windowed predictions

See also: #9

  • pf_window
  • pf_window_channel
  • bpp_window (using vrna_probs_window)
  • bpp_window_channel
  • zscore

Soft and hard constraints

  • soft constraints
  • hard constraints

G-quadruplexes

Duplex predictions

  • strand concentrations?

Multistrand predictions

  • strand concentrations

RNA-DNA hybrids

  • only duplexes?

Grammar callbacks

  • callbacks at various stages of recursions

ViennaRNA-2.6.0

https://github.com/ViennaRNA/ViennaRNA/releases/tag/v2.6.0

  • energies for modified bases (uses soft constraints callback), energy parameter modifications stored as json
  • adjust energies to different salt concentrations (Na+)
  • new soft constraints multi-callback dispatcher

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.