Giter Club home page Giter Club logo

Comments (4)

ivirshup avatar ivirshup commented on August 16, 2024

I think we should definitely have concat_on_disk do this (though it's possible that could even just count how big everything is before writing anything?).

I don't think we need to always write int64 though. Modifying existing files is a smaller use case, where we now error.

@ilan-gold would you be up for tackling this?

from anndata.

agdenadel avatar agdenadel commented on August 16, 2024

Following up on this issue, the workaround I proposed above doesn't work out of the box when trying to subset datasets. In particular, if adata is an anndata object with int64 indptrs, this doesn't work

adata = adata[:, adata.var.highly_variable]

adata.X yields

ValueError: Output dtype not compatible with inputs.

It seems like consecutive elements are fine (adata[:, 1:20].X returns <177524x19 sparse matrix of type '<class 'numpy.float32'>'), but the new object has int32 indptrs (adata[:, 1:20].X.indptr returns array([ ... ], dtype=int32)

If I cast the indptrs back to int32, the issue is resolved, but his isn't helpful since I want to work with larger anndata objects

adata.X.indptr = np.array(adata.X.indptr, dtype=np.int32)
adata = adata[:, adata.var.highly_variable]
adata.X
<177524x2868 sparse matrix of type '<class 'numpy.float32'>'
	with 35372586 stored elements in Compressed Sparse Row format>

from anndata.

agdenadel avatar agdenadel commented on August 16, 2024

@ivirshup I just found your issue in scipy

scipy/scipy#20182

I think upgrading to scipy 1.13.0 solved the subsetting issue

from anndata.

minigel avatar minigel commented on August 16, 2024

@ilan-gold Thank you for fix #1493 as I am having this same issue. I tried concatenating some large files with concat_on_disk using the version of merge.py in this fix, but encountered the following error:

Session information updated at 2024-05-09 01:16
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/scverse/lib/python3.12/site-packages/anndata/experimental/merge.py", line 632, in concat_on_disk
    _write_concat_arrays(
  File "/home/user/miniconda3/envs/scverse/lib/python3.12/site-packages/anndata/experimental/merge.py", line 300, in _write_concat_arrays
    write_concat_sparse(
  File "/home/user/miniconda3/envs/scverse/lib/python3.12/site-packages/anndata/experimental/merge.py", line 235, in write_concat_sparse
    out_dataset.append(temp_elem)
  File "/home/user/miniconda3/envs/scverse/lib/python3.12/site-packages/anndata/_core/sparse_dataset.py", line 531, in append
    indptr.resize((orig_data_size + sparse_matrix.indptr.shape[0] - 1,))
  File "/home/user/miniconda3/envs/scverse/lib/python3.12/site-packages/h5py/_hl/dataset.py", line 666, in resize
    raise TypeError("Only chunked datasets can be resized")
TypeError: Only chunked datasets can be resized 

from anndata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.