Giter Club home page Giter Club logo

Comments (8)

jbms avatar jbms commented on June 5, 2024

Yes, you can do that with the stack driver:

https://google.github.io/tensorstore/driver/stack/index.html

This driver allows you to do the equivalent of np.stack, np.concatenate, np.block, and also more generally to overlay arrays on top of each other.

Unfortunately, we don't yet have a convenient C++ or Python API for creating a stack driver-backed TensorStore object that corresponds to those various NumPy APIs --- instead you will have to create the JSON spec yourself.

from tensorstore.

jamespinkerton avatar jamespinkerton commented on June 5, 2024

This is very helpful. I think it's exactly what I wanted. I have a follow-up question that I'm struggling with. I have a snippet of code that reindexes each element. I don't know how to combine this reindexing with TensorStore stack. I'm reindexing both on the output and the input, which makes it tricky. Is it possible to translate this into stack?

def select_tensorstore(
    urls: list[str],
    start_ixs_list: list[list[str]],
    end_ixs: list[str],
) -> np.ndarray:
    r = []
    for url, start_ixs in zip(urls, start_ixs_list):
        ts_spec = {"driver": "zarr", "kvstore": url}
        z1 = tensorstore.open(ts_spec, create=False, read=True).result()
        out_ix = [i for i, key in enumerate(end_ixs) if key in set(start_ixs)]
        in_ix = [i for i, key in enumerate(start_ixs) if key in set(end_ixs)]
        out = np.zeros((z1.shape[0], len(end_ixs)), dtype=np.float32)
        out[:, out_ix] = z1.oindex[:, in_ix]
        r.append(out)
    return np.concatenate(r, axis=0)

from tensorstore.

jbms avatar jbms commented on June 5, 2024

stack can definitely do the equivalent of the outer concatenate, and serves that purpose well.

The out[:, out_ix] = ... is basically a scatter operation. TensorStore doesn't specifically support a "virtual scatter" operation, but it could be done with the stack driver. However, that would require k stack layers, where k is the number of contiguous components within out_ix, or for simplicity, len(out_ix) stack layers. Depending on the size in the other dimensions, this might have too much overhead, though it may also work fine.

If, instead of 0, it would be okay to substitute an arbitrary value at out[:, j] for values of j not in out_ix, then you could instead invert out_ix (substituting e.g. an index of 0 for missing positions) and then just use regular indexing rather than this scatter operation.

Another way to accomplish this with tensorstore is to use the virtual_chunked adapter. That creates a tensorstore from an arbitrary Python function. You could use virtual_chunked with the code from within your loop, and then apply the stack driver to concatenate the virtual_chunked drivers.

from tensorstore.

jamespinkerton avatar jamespinkerton commented on June 5, 2024

I think your solution 1 would have a lot of overhead and would create a lot of issues. Solutions 2 and 3 seem like the best candidates.

For solution 2, I think I could pad the last index of every archive with 0s and then it would be simple. I just have to duplicate that last index over and over again in my in_ix, right? I’m not sure if there’s an easier way to insert it without padding the original upload. I could also just do what you’re saying and override with 0s after the download is complete

from tensorstore.

jbms avatar jbms commented on June 5, 2024

You could actually use the stack driver to pad with zeros virtually, by combining the original array with an array of zeros.

Then you can do the indexing on top of that, and then use the stack driver again to concatenate.

This would probably be the best solution.

from tensorstore.

jamespinkerton avatar jamespinkerton commented on June 5, 2024

OK I'm very close to having a solution. The last issue is the lack of a python API for the stack driver. If there a way for me to take the other parts of the API using python and then extract the JSON from that and re-compose it for stack?

The documentation is pretty good for all things python, but when you have to go to the JSON spec it gets very tricky (not a lot of examples).

from tensorstore.

jbms avatar jbms commented on June 5, 2024

Adding a Python API for the stack driver is on the TODO list.

You can just call store.spec().to_json() to get the JSON representation.

from tensorstore.

jamespinkerton avatar jamespinkerton commented on June 5, 2024

OK I got this working. Thanks for your help!

from tensorstore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.