Comments (8)
Yes, you can do that with the stack
driver:
https://google.github.io/tensorstore/driver/stack/index.html
This driver allows you to do the equivalent of np.stack
, np.concatenate
, np.block
, and also more generally to overlay arrays on top of each other.
Unfortunately, we don't yet have a convenient C++ or Python API for creating a stack driver-backed TensorStore object that corresponds to those various NumPy APIs --- instead you will have to create the JSON spec yourself.
from tensorstore.
This is very helpful. I think it's exactly what I wanted. I have a follow-up question that I'm struggling with. I have a snippet of code that reindexes each element. I don't know how to combine this reindexing with TensorStore stack. I'm reindexing both on the output and the input, which makes it tricky. Is it possible to translate this into stack?
def select_tensorstore(
urls: list[str],
start_ixs_list: list[list[str]],
end_ixs: list[str],
) -> np.ndarray:
r = []
for url, start_ixs in zip(urls, start_ixs_list):
ts_spec = {"driver": "zarr", "kvstore": url}
z1 = tensorstore.open(ts_spec, create=False, read=True).result()
out_ix = [i for i, key in enumerate(end_ixs) if key in set(start_ixs)]
in_ix = [i for i, key in enumerate(start_ixs) if key in set(end_ixs)]
out = np.zeros((z1.shape[0], len(end_ixs)), dtype=np.float32)
out[:, out_ix] = z1.oindex[:, in_ix]
r.append(out)
return np.concatenate(r, axis=0)
from tensorstore.
stack
can definitely do the equivalent of the outer concatenate
, and serves that purpose well.
The out[:, out_ix] = ...
is basically a scatter operation. TensorStore doesn't specifically support a "virtual scatter" operation, but it could be done with the stack driver. However, that would require k
stack layers, where k
is the number of contiguous components within out_ix
, or for simplicity, len(out_ix)
stack layers. Depending on the size in the other dimensions, this might have too much overhead, though it may also work fine.
If, instead of 0, it would be okay to substitute an arbitrary value at out[:, j]
for values of j
not in out_ix
, then you could instead invert out_ix
(substituting e.g. an index of 0 for missing positions) and then just use regular indexing rather than this scatter operation.
Another way to accomplish this with tensorstore is to use the virtual_chunked
adapter. That creates a tensorstore from an arbitrary Python function. You could use virtual_chunked
with the code from within your loop, and then apply the stack
driver to concatenate the virtual_chunked
drivers.
from tensorstore.
I think your solution 1 would have a lot of overhead and would create a lot of issues. Solutions 2 and 3 seem like the best candidates.
For solution 2, I think I could pad the last index of every archive with 0s and then it would be simple. I just have to duplicate that last index over and over again in my in_ix, right? I’m not sure if there’s an easier way to insert it without padding the original upload. I could also just do what you’re saying and override with 0s after the download is complete
from tensorstore.
You could actually use the stack driver to pad with zeros virtually, by combining the original array with an array of zeros.
Then you can do the indexing on top of that, and then use the stack driver again to concatenate.
This would probably be the best solution.
from tensorstore.
OK I'm very close to having a solution. The last issue is the lack of a python API for the stack driver. If there a way for me to take the other parts of the API using python and then extract the JSON from that and re-compose it for stack?
The documentation is pretty good for all things python, but when you have to go to the JSON spec it gets very tricky (not a lot of examples).
from tensorstore.
Adding a Python API for the stack driver is on the TODO list.
You can just call store.spec().to_json()
to get the JSON representation.
from tensorstore.
OK I got this working. Thanks for your help!
from tensorstore.
Related Issues (20)
- Please create a pre-built PyPI wheel for linux arm64 HOT 4
- Support of large files for grpc_kvstore HOT 1
- S3 kvstore driver not being recognized HOT 3
- downsample driver produces repeating patterns HOT 8
- Debug logs for S3 driver, log requests and responses HOT 9
- Cache pool context from multiple python processes HOT 2
- Error while installing "tensorstore" on jetson linux (jetpack) HOT 1
- Assertion `IsUnmodified()' failed HOT 6
- How to use a custom bazel installation? HOT 2
- Building on linux with ppc64le arch
- Specify S3 credentials directly HOT 3
- Transactional/ACID semantics HOT 1
- Failing to build tensorestore as a cmake project HOT 4
- Using s3 kvstore with minio HOT 2
- Error reading shard index, Requested byte range... was not satisfied by response with byte range ... HOT 4
- Unable to include tensorstore as a cmake dependency
- Question: does tensorstore support array with multiple dynamic dimensions? HOT 2
- Clarify in documentation if the C++ API is thread safe HOT 2
- Writing local files fails on Windows 11 HOT 3
- Python library fails to compile with gcc 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorstore.