Giter Club home page Giter Club logo

Comments (9)

dugalh avatar dugalh commented on May 23, 2024 1

Some details on the other issues I am having... Opening a raster with opener= in a thread gives me a LookupError:

import io
import rasterio as rio
from threading import Thread


def target():
    with rio.open('tests/data/RGB.byte.tif', 'r', opener=io.open):
        pass


thread = Thread(target=target)
thread.start()
thread.join()
...
  File "rasterio/_vsiopener.pyx", line 291, in _opener_registration
LookupError: <ContextVar name='opener_registery' at 0x7f86ca9e8e50>

from rasterio.

sgillies avatar sgillies commented on May 23, 2024 1

@dugalh thanks for the info! I'm going to think harder about the zipfile case. Indeed, it's the zipfile that's the container, not the directory containing the zipfile. For the threading issue, I'll double check that I'm using context vars properly. Support for VFS is something I wanted to stay away from, but we can't overwrite a file without it.

from rasterio.

dugalh avatar dugalh commented on May 23, 2024 1

@sgillies I ran into another problem using openers. Building overviews for a dataset created with an opener hangs (doesn't return) when the dataset is opened in an environment with GDAL_NUM_THREADS>1:

import io
from pathlib import Path
import numpy as np
import rasterio as rio

array = np.ones((3, 240, 320), dtype='uint8')
profile = rio.default_gtiff_profile
profile.update(width=array.shape[2], height=array.shape[1], count=array.shape[0], dtype=array.dtype)

filename = Path('test.tif')
filename.unlink(missing_ok=True)

with rio.Env(GDAL_NUM_THREADS=2):
    with rio.open(filename, 'w', **profile, opener=io.open) as im:
        im.write(array)
        im.build_overviews([2])

from rasterio.

sgillies avatar sgillies commented on May 23, 2024

@dugalh thank you for the report! I can reproduce this.

We don't need to raise in that situation, it's harmless. I think will still need to raise in the case where we try to pass a different opener for the same directory. For example:

with rasterio.open("tests/data/RGB.byte.tif", opener=io.open) as dataset1:
    with rasterio.open("tests/data/RGB2.byte.tif", opener=fsspec.open) as dataset2:
        pass

Should raise, I think, because clobbering the already registered opener could render the opened dataset inaccessible, depending on the format and particularities of the opener. Does that make sense? Do you have any other comments on how you'd like this to work?

BTW, openers are registered by directory or container to enable sidecar files to be accessed.

from rasterio.

dugalh avatar dugalh commented on May 23, 2024

Thanks @sgillies - this feature will be really useful.

It makes sense that you can't register different openers for the same file. I would like to be able to use the same opener for different files, possibly in the same directory though. Without understanding the internals, it is not obvious to me why openers are registered by directory. There could be a use for different openers with different files in the same directory I think.

from rasterio.

dugalh avatar dugalh commented on May 23, 2024

BTW, I am also having problems using the opener argument in a thread, or for overwriting an existing file. Should I raise issues for these too?

from rasterio.

sgillies avatar sgillies commented on May 23, 2024

@dugalh since those are closely related, let's keep them together here until we decide that they should break out.

I'd love to know more about your use cases and what you are using for openers. Can you share? At my day job, we have a virtual filesystem, implemented in C++ with Python bindings, that has more support for fancy AWS auth than GDAL does. Role chaining, specifically. We can deploy systems that read and write to a customer's S3 bucket with a lot of finesse. That's the primary driver for this feature. One thing for sure, most users should never use Python's io.open() or urllib.request.urlopen() when they could use GDAL's built-in virtual filesystems.

Registering openers by directory makes the assumption that datasets are together in a collection/folder/directory because they have the same permissions and are meant to be accessed using the same mechanisms. I think it's a pretty solid assumption, though I admit that it is very strict. Do you see exceptions that I am overlooking?

But the main benefit of registering by directory is that it makes it easy for a GDAL or OGR driver to find sidecar files like .tfw, .jpw, .aux.xml, etc, using the same opener that was registered for the primary file.

from rasterio.

dugalh avatar dugalh commented on May 23, 2024

I am working on an orthorectification tool where I want to allow users to specify remote / cloud input and output images, with possibly different locations. The ortho is generated tile by tile, so input & output images are open at the same time. Ortho relevant info like RPC coefficients and compound CRSs are sometimes stored in sidecar files (e.g. GDAL seems to only store some compound CRSs in PAM files), so being able to read / write those is important. Then, the tool also reads / writes other non-geospatial files, so using fsspec for everything (images and other files) makes it simpler from both user and code perspectives.

I appreciate that there are probably performance implications to this and would be interested to hear your thoughts on that. I will avoid using the opener argument where I can.

I thought having different openers in the same directory could happen with something like an fsspec zip file system and local file system operating on different files in the same directory. Is that fair? I wouldn't want allowing this to prevent access to sidecar files though.

from rasterio.

dugalh avatar dugalh commented on May 23, 2024

And overwriting an existing file with an fsspec filesystem opener gives me a CPLE_AppDefinedError. I'm not sure if this qualifies as a bug. Can the filesystem be used to delete?

import io
import fsspec
import rasterio as rio

profile = rio.default_gtiff_profile
profile.update(width=1, height=1, count=1)

of = fsspec.open('test.tif', 'wb')
for _ in range(2):
    with rio.open(of.path, 'w', **profile, opener=of.fs) as im:
        pass
File rasterio/_io.pyx:1483, in rasterio._io.DatasetWriterBase.__init__()

File rasterio/_io.pyx:333, in rasterio._io._delete_dataset_if_exists()

File rasterio/_err.pyx:289, in rasterio._err.exc_wrap_int()

CPLE_AppDefinedError: Deleting /vsiriopener//home/dugalh/test.tif failed: Success

from rasterio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.