Comments (9)
Some details on the other issues I am having... Opening a raster with opener=
in a thread gives me a LookupError
:
import io
import rasterio as rio
from threading import Thread
def target():
with rio.open('tests/data/RGB.byte.tif', 'r', opener=io.open):
pass
thread = Thread(target=target)
thread.start()
thread.join()
...
File "rasterio/_vsiopener.pyx", line 291, in _opener_registration
LookupError: <ContextVar name='opener_registery' at 0x7f86ca9e8e50>
from rasterio.
@dugalh thanks for the info! I'm going to think harder about the zipfile case. Indeed, it's the zipfile that's the container, not the directory containing the zipfile. For the threading issue, I'll double check that I'm using context vars properly. Support for VFS is something I wanted to stay away from, but we can't overwrite a file without it.
from rasterio.
@sgillies I ran into another problem using openers. Building overviews for a dataset created with an opener hangs (doesn't return) when the dataset is opened in an environment with GDAL_NUM_THREADS>1
:
import io
from pathlib import Path
import numpy as np
import rasterio as rio
array = np.ones((3, 240, 320), dtype='uint8')
profile = rio.default_gtiff_profile
profile.update(width=array.shape[2], height=array.shape[1], count=array.shape[0], dtype=array.dtype)
filename = Path('test.tif')
filename.unlink(missing_ok=True)
with rio.Env(GDAL_NUM_THREADS=2):
with rio.open(filename, 'w', **profile, opener=io.open) as im:
im.write(array)
im.build_overviews([2])
from rasterio.
@dugalh thank you for the report! I can reproduce this.
We don't need to raise in that situation, it's harmless. I think will still need to raise in the case where we try to pass a different opener for the same directory. For example:
with rasterio.open("tests/data/RGB.byte.tif", opener=io.open) as dataset1:
with rasterio.open("tests/data/RGB2.byte.tif", opener=fsspec.open) as dataset2:
pass
Should raise, I think, because clobbering the already registered opener could render the opened dataset inaccessible, depending on the format and particularities of the opener. Does that make sense? Do you have any other comments on how you'd like this to work?
BTW, openers are registered by directory or container to enable sidecar files to be accessed.
from rasterio.
Thanks @sgillies - this feature will be really useful.
It makes sense that you can't register different openers for the same file. I would like to be able to use the same opener for different files, possibly in the same directory though. Without understanding the internals, it is not obvious to me why openers are registered by directory. There could be a use for different openers with different files in the same directory I think.
from rasterio.
BTW, I am also having problems using the opener argument in a thread, or for overwriting an existing file. Should I raise issues for these too?
from rasterio.
@dugalh since those are closely related, let's keep them together here until we decide that they should break out.
I'd love to know more about your use cases and what you are using for openers. Can you share? At my day job, we have a virtual filesystem, implemented in C++ with Python bindings, that has more support for fancy AWS auth than GDAL does. Role chaining, specifically. We can deploy systems that read and write to a customer's S3 bucket with a lot of finesse. That's the primary driver for this feature. One thing for sure, most users should never use Python's io.open()
or urllib.request.urlopen()
when they could use GDAL's built-in virtual filesystems.
Registering openers by directory makes the assumption that datasets are together in a collection/folder/directory because they have the same permissions and are meant to be accessed using the same mechanisms. I think it's a pretty solid assumption, though I admit that it is very strict. Do you see exceptions that I am overlooking?
But the main benefit of registering by directory is that it makes it easy for a GDAL or OGR driver to find sidecar files like .tfw, .jpw, .aux.xml, etc, using the same opener that was registered for the primary file.
from rasterio.
I am working on an orthorectification tool where I want to allow users to specify remote / cloud input and output images, with possibly different locations. The ortho is generated tile by tile, so input & output images are open at the same time. Ortho relevant info like RPC coefficients and compound CRSs are sometimes stored in sidecar files (e.g. GDAL seems to only store some compound CRSs in PAM files), so being able to read / write those is important. Then, the tool also reads / writes other non-geospatial files, so using fsspec for everything (images and other files) makes it simpler from both user and code perspectives.
I appreciate that there are probably performance implications to this and would be interested to hear your thoughts on that. I will avoid using the opener argument where I can.
I thought having different openers in the same directory could happen with something like an fsspec zip file system and local file system operating on different files in the same directory. Is that fair? I wouldn't want allowing this to prevent access to sidecar files though.
from rasterio.
And overwriting an existing file with an fsspec filesystem opener
gives me a CPLE_AppDefinedError
. I'm not sure if this qualifies as a bug. Can the filesystem be used to delete?
import io
import fsspec
import rasterio as rio
profile = rio.default_gtiff_profile
profile.update(width=1, height=1, count=1)
of = fsspec.open('test.tif', 'wb')
for _ in range(2):
with rio.open(of.path, 'w', **profile, opener=of.fs) as im:
pass
File rasterio/_io.pyx:1483, in rasterio._io.DatasetWriterBase.__init__()
File rasterio/_io.pyx:333, in rasterio._io._delete_dataset_if_exists()
File rasterio/_err.pyx:289, in rasterio._err.exc_wrap_int()
CPLE_AppDefinedError: Deleting /vsiriopener//home/dugalh/test.tif failed: Success
from rasterio.
Related Issues (20)
- rasterio WarpedVRT vs QGIS Warp tool
- rasterize can no longer handle dtype instances HOT 2
- Change python opener VSI plugin prefix to /vsiriopener HOT 1
- Small typo: Release 1.4a2 requires Python 3.8 instead of 3.9 HOT 1
- Test performance: Dependencies installed twice HOT 1
- `reproject` with `src_geoloc_array` introduces unexpected shift for group of pixels HOT 2
- 1.3.10 release HOT 3
- "pytest.PytestRemovedIn8Warning: Passing None has been deprecated" test failures HOT 1
- Run pyupgrade on code
- Writing a COG to a file-like object in "wb" mode fails in 1.4a2 HOT 4
- Disallow rotated rasters in merge() HOT 3
- Add correct python syntax highlighting to README HOT 2
- update_tags does not work for special tags like PIXEL_OR_AREA HOT 2
- Boundless mask reads are incorrect for complex data and GDAL 3.8
- rasterio.merge.merge() uses ~10 times more memory than specified by `mem_limit` HOT 1
- Unexpected sieve results when mask is entirely false
- access Sentinel-1 IW SLC tiff measurement via S3 on creodias entrypoint
- memory doesn't free using features.geometry_mask function with fastapi HOT 2
- Typo in WarpedVRT's docstring
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rasterio.