Giter Club home page Giter Club logo

muon.jl's Introduction

Muon.jl

Muon for Julia

Muon is originally a Python library to work with multimodal data. Muon.jl brings the ability to work with the same data structures to Julia.

Muon.jl implements I/O for .h5mu and .h5ad files as well as basic operations on the multimodal objects.

Introduction

Datasets can usually be represented as matrices with values for the variables measured in different samples, or observations. Variables and observations tend to have annotations attached to them, a typical example would be metadata annotating samples. Such a dataset with the matrix in its centre and different kinds of annotations associated with it can be stored conveniently in an annotated data object, AnnData for short.

Multimodal datasets are characterised by the variables coming from different generative processes. Each of these modalities is an annotated dataset by itself, but they can be managed and analyzed together within a MuData object.

Examples

MuData objects can be created from .h5mu files:

using Muon

mdata = readh5mu("pbmc10k.h5mu");

Individual modalities can be accessed directly by their name:

mdata["rna"]
# => AnnData object 10110 ✕ 101001

Low-dimensional representations of the data can be plotted with the plotting library of choice:

using DataFrames
using GLMakie
using AlgebraOfGraphics

df = DataFrame(LF1 = mdata.obsm["X_umap"][1,:],
               LF2 = mdata.obsm["X_umap"][2,:]);

data(df) * mapping(:LF1, :LF2) * visual(Scatter) |> draw

muon.jl's People

Contributors

dcjones avatar dmbates avatar gtca avatar ilia-kats avatar smnbl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

muon.jl's Issues

AnnData functions?

Does Muon support the functions from the original python AnnData? For example: adata.var_names_make_unique()?
Thanks for developing this package!

New release?

We have a data set where the released version of Muon fails, but the main branch succeeds. Would it be possible to get a new release? It seems that 38 commits have been merged since the last release.

structured array I/O

There might be a problem with reading categorical variables in .h5ad/.h5mu files:

MethodError: Cannot `convert` an object of type

Vector{NamedTuple{(Symbol("0"), Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol("7"), Symbol("8"), Symbol("9"), Symbol("10"), Symbol("11"), Symbol("12"), Symbol("13")), NTuple{14, Float32}}} to an object of type

Union{AbstractString, Number, DataFrames.DataFrame, Dict, AbstractArray{<:AbstractString}, AbstractArray{<:Number}}

A quick fix would be e.g. to add a permissive type to the read_dict_of_mixed() function:

function read_dict_of_mixed(f::HDF5.Group; kwargs...)
    ret = Dict{
        String,
        Union{
            ...
            Dict,
            <:Any,
        },
  }()
...

But I expect more reasonable types can be used in the end.

Use PooledArrays.PooledArray for unordered categorical columns

At present readh5ad converts categorical columns in the obs and var dataframes to CategoriclArrays. These are compilcated structures which can sometimes put a burden on the Julia compiler if there are many levels in the column. An alternative for unordered categorical arrays is PooledArray from the PooledArrays package. I would be happy to create a pull request for your consideration if you would be willing to consider it

Error reading h5ad AnnData object file

Hello:

I am trying to read an h5ad AnnData object file in this way:

using Muon

file = "msc_sokm.h5ad"
ad = readh5ad(file)

but I get this error:

ERROR: MethodError: Cannot `convert` an object of type 
  Vector{NamedTuple{(Symbol("0"), Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol("7"), Symbol("8"), Symbol("9")), NTuple{10, Float32}}} to an object of type 
  Union{AbstractString, Number, DataFrame, Dict, AbstractArray{<:AbstractString}, AbstractArray{<:Number}}
Closest candidates are:
  convert(::Type{T}, ::T) where T at ~/julia-1.7.2/share/julia/base/essentials.jl:218
Stacktrace:
 [1] setindex!(h::Dict{String, Union{AbstractString, Number, DataFrame, Dict, AbstractArray{<:AbstractString}, AbstractArray{<:Number}}}, v0::Vector{NamedTuple{(Symbol("0"), Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol("7"), Symbol("8"), Symbol("9")), NTuple{10, Float32}}}, key::String)
   @ Base ./dict.jl:381
 [2] read_dict_of_mixed(f::HDF5.Group; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:separate_index,), Tuple{Bool}}})
   @ Muon ~/.julia/packages/Muon/eLqpV/src/hdf5_io.jl:113
 [3] read_auto(f::HDF5.Group; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:separate_index,), Tuple{Bool}}})
   @ Muon ~/.julia/packages/Muon/eLqpV/src/hdf5_io.jl:96
 [4] read_dict_of_mixed(f::HDF5.Group; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:separate_index,), Tuple{Bool}}})
   @ Muon ~/.julia/packages/Muon/eLqpV/src/hdf5_io.jl:113
 [5] AnnData(file::HDF5.File, backed::Bool, checkversion::Bool)
   @ Muon ~/.julia/packages/Muon/eLqpV/src/anndata.jl:77
 [6] readh5ad(filename::String; backed::Bool)
   @ Muon ~/.julia/packages/Muon/eLqpV/src/anndata.jl:150
 [7] readh5ad
   @ ~/.julia/packages/Muon/eLqpV/src/anndata.jl:142 [inlined]
 [8] main()
   @ Main ~/working_with_Julia/ICDM_2022/data_exploration.jl:6
 [9] top-level scope
   @ ~/working_with_Julia/ICDM_2022/data_exploration.jl:9

I am using Julia v1.7.2 and Muon v0.1.1.

I don't know if I am doing something wrong.

Thank you.

subsetting + views of MuData/AnnData

We want to be able to write mdata[1:10, 5] and get a corresponding MuData object back, as in the Python implementation. One point of discussion is the handling of views. In contrast to NumPy, Julia does not create views by default, one has to request one explicitly with @view or @views. I think it makes sense to adopt this behavior for Muon to be consistent with the rest of the Julia ecosystem.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Logo text

Just noticing a few issues with how the logo text in the docs (svg file)

Logo has text embedded as <text>

image

So it just defaults to browser font, since I don't have the specific font available. Probably worth exporting the text as shapes so it looks consistent.

Text looks a bit funny in dark mode

image

Maybe documenter gives lets you specify an alternate for dark mode?

custom ordered set implementation with key and index access

As explained in the commit message for 6d366e0, we need to store obs_names and var_names as an ordered set to guarantee fast lookup by name. We also require the ordered set to support access by index and mapping of keys to indices. OrderedCollections deprecated the former functionality and never officially supported the latter, due to ambiguities upon deletion of elements. Our usecase does not require deletion.

Unless I'm missing something, we could even get away with a completely immutable set, which would be relatively easy to implement.

Installation error

Hi team,
I got this error when I installed by Pkg.Add("Muon")

Unsatisfiable requirements detected for package Muon [446846d7]:
 Muon [446846d7] log:
 ├─possible versions are: 0.1.0-0.1.1 or uninstalled
 ├─restricted to versions * by an explicit requirement, leaving only versions 0.1.0-0.1.1
 └─restricted by compatibility requirements with DataFrames [a93c6f00] to versions: uninstalled — no versions left
   └─DataFrames [a93c6f00] log:
     ├─possible versions are: 0.11.7-1.2.0 or uninstalled
     ├─restricted to versions * by Baysor [cc9f9468], leaving only versions 0.11.7-1.2.0
     │ └─Baysor [cc9f9468] log:
     │   ├─possible versions are: 0.5.0 or uninstalled
     │   └─Baysor [cc9f9468] is fixed to version 0.5.0
     ├─restricted by compatibility requirements with DataFramesMeta [1313f7d8] to versions: 0.13.0-1.2.0
     │ └─DataFramesMeta [1313f7d8] log:
     │   ├─possible versions are: 0.4.0-0.8.0 or uninstalled
     │   ├─restricted to versions * by Baysor [cc9f9468], leaving only versions 0.4.0-0.8.0
     │   │ └─Baysor [cc9f9468] log: see above
     │   └─restricted by compatibility requirements with DataFrames [a93c6f00] to versions: 0.4.0-0.7.1 or uninstalled, leaving only versions: 0.4.0-0.7.1
     │     └─DataFrames [a93c6f00] log: see above
     └─restricted by compatibility requirements with TableReader [70df011a] to versions: 0.11.7-0.21.8, leaving only versions: 0.13.0-0.21.8
       └─TableReader [70df011a] log:
         ├─possible versions are: 0.1.0-0.4.0 or uninstalled
         └─restricted to versions * by an explicit requirement, leaving only versions 0.1.0-0.4.0

Could you suggest a way to get around this?
Many thanks!

Sparse matrices should be parsed

Currently, reading a sparse matrix results in a Dict{}:

mdata["atac"].layers["counts"]
Dict{String,Any} with 3 entries:
  "data"    => Float32[4.0, 2.0, 2.0, 2.0, 1.0, 4.0, 2.0, 2.0, 2.0, 2.0    8.0
  "indptr"  => Int32[0, 7251, 13779, 17102, 21369, 33001, 40243, 48843, 56561, 
  "indices" => Int32[7, 11, 12, 25, 36, 38, 41, 50, 51, 53    105979, 105980, 

This has to be parsed into a SparseArray.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.