rasterio / rasterio Goto Github PK
View Code? Open in Web Editor NEWRasterio reads and writes geospatial raster datasets
Home Page: https://rasterio.readthedocs.io/
License: Other
Rasterio reads and writes geospatial raster datasets
Home Page: https://rasterio.readthedocs.io/
License: Other
Deletion of the file from scripts. Before 1.0.
It would be a very cool feature if rasterio could transparently handle iteration over blocks. Almost all my project involves raster files which are too large to load into memory.
It could maybe be something like:
with rasterio.open('input.tif') as src:
kwargs = src.meta
with rasterio.open('output.tif', 'w', **kwargs) as dst:
# Iterate over raster blocks from band 1
for block, data in src.band_blocks(1):
# Block holds xoffset, yoffset, width, height
data = data + 1
dst.write_band(1, data, block)
Like the one in reproject(). It'll boost rio_cp.
When working with NumPy dimensions are in y,x order but rasterio methods expect pixel coordinates in x,y order.
Imho this an unnecessary complication.
The current rasterio basic features example expects an proper GeoTiff file as input. The example could be easily used with e.g. JPEG files by changing:
kwargs = src.meta
kwargs.update(
dtype=rasterio.uint8,
count=1,
compress='lzw')
to something like:
try:
kwargs = src.meta
except ValueError:
kwargs ={'driver': 'GTiff',
'height': src.height,
'width': src.width }
kwargs.update(
dtype=rasterio.uint8,
count=1,
compress='lzw')
In a nutshell: let methods accept (dataset, bidx) tuples as well as ndarrays.
This would let the polygonization example
with rasterio.drivers():
with rasterio.open(raster_file) as src:
image = src.read_band(1)
results = (
{'properties': {'raster_val': v}, 'geometry': s}
for i, (s, v)
in enumerate(
shapes(image, transform=src.transform)))
be condensed to
with rasterio.drivers():
with rasterio.open(raster_file) as src:
results = (
{'properties': {'raster_val': v}, 'geometry': s}
for i, (s, v)
in enumerate(
shapes((src, 1), transform=src.transform)))
This cuts out 2 copies: source band to ndarray in read_band() and ndarray to GDAL in-memory dataset within shapes().
On your blog you describe using with
for driver management. What happens in the following case?
with rasterio.drivers():
with rasterio.drivers():
pass
with rasterio.open('rasterio/tests/data/RGB.byte.tif') as f:
pass
Here, the code is obviously 'wrong', but with
s are designed to be nested.
Maybe add a reference count to DriverManager
?
See http://trac.osgeo.org/gdal/wiki/rfc15_nodatabitmask. In rasterio, the usage would be this:
# To read
with rasterio.open(file) as src:
band = src.read_band(1)
mask = src.read_mask(1)
# To write
with rasterio.open(file, 'w', ...) as dst:
dst.write_mask(1, mask)
The GDAL functions to use are GDALGetMaskBand and GDALCreateMaskBand.
Via a creation keyword: nodata=value, applied equally to all bands.
Nothing fancy, just an description of what rasterio offers re the GDAL data model and how to go from C style programming with osgeo.gdal to Python style with rasterio.
And I don't see how to access a Driver's pfnCreate attribute from the C API, therefore rasterio will for now support opening only GTiffs in 'w' mode.
GDAL's low-level warp kernel supports warping between image buffers. For rasterio, this means that we can have an interface for warping directly from one Numpy array (or object that we can get a Cython memoryview on) to another backed by existing functions in the GDAL lib.
Such a function exists in scipy.ndimage: http://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.interpolation.geometric_transform.html and it looks like it's mainly in C but one in rasterio could avoid a dependency on Scipy for a process that programmers are well used to getting from GDAL.
I'm thinking a similar interface would be good, something like
def rasterio.warp(input, output_shape=None, output=None, ...):
...
which is what rasterio's read_band and write_band follow.
References:
cc @hobu
When using rasterio with Python 2.6, I get the following error:
$ /usr/local/bin/python -c "import rasterio"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.6/site-packages/rasterio/__init__.py", line 8, in <module>
from rasterio._io import RasterReader, RasterUpdater
File "_io.pyx", line 12, in init rasterio._io (rasterio/_io.c:20093)
File "/usr/local/lib/python2.6/site-packages/rasterio/dtypes.py", line 26
dtype_rev = {v: k for k, v in dtype_fwd.items()}
^
SyntaxError: invalid syntax
A solution could be to change this:
dtype_rev = {v: k for k, v in dtype_fwd.items()}
to:
dtype_rev = dict([(v, k) for k, v in dtype_fwd.items()])
Tag and upload sdist to PyPI.
Isn't there supposed to be a rasterio/_copy.pyx
in the repo?
We would like to be able to create a numpy mask from a shapely geometry.
Here is an example that is roughly similar in intent to approach referenced in Issue #1 but different in implementation in that it uses gdal.RasterizeLayer instead of PIL ImageDraw (benchmarks comparing the two have not yet been performed):
def mask_from_geometry(ndarray_shape, geometry, projection_wkt, transform, all_touched=False):
"""
Create a boolean numpy mask from a Shapely geometry. Data must be projected to match prior to calling this function.
Areas are coded as 1 inside the geometry, and 0 outside. Invert this to use as a numpy.ma mask.
:param ndarray_shape: (rows, cols)
:param geometry: Shapely geometry object
:param projection_wkt: the projection of the geometry and target numpy array, as WKT
:param transform: the GDAL transform object representing the spatial domain of the target numpy array
:param all_touched: if true, any pixel that touches geometry will be included in mask.
If false, only those with centroids within or selected by Brezenhams line algorithm will be included.
See http://www.gdal.org/gdal__alg_8h.html#adfe5e5d287d6c184aab03acbfa567cb1 for more information.
"""
assert len(ndarray_shape) == 2
sr = osr.SpatialReference()
sr.ImportFromWkt(projection_wkt)
target_ds = gdal.GetDriverByName("MEM").Create("", ndarray_shape[1], ndarray_shape[0], gdal.GDT_Byte)
target_ds.SetProjection(sr.ExportToWkt())
target_ds.SetGeoTransform(transform)
temp_features = ogr.GetDriverByName("Memory").CreateDataSource("")
lyr = temp_features.CreateLayer("poly", srs=sr)
feature = ogr.Feature(lyr.GetLayerDefn())
feature.SetGeometryDirectly(ogr.Geometry(wkb = geometry.wkb))
lyr.CreateFeature(feature)
kwargs = {}
if all_touched:
kwargs['options'] = ["ALL_TOUCHED=TRUE"]
gdal.RasterizeLayer(target_ds, [1], lyr, burn_values=[1], **kwargs)
return target_ds.GetRasterBand(1).ReadAsArray(0, 0, ndarray_shape[1], ndarray_shape[0]).astype(numpy.bool)
We would like to port this to Cython and create a pull request, but first wanted to agree on the parameters for this function. Having to pass in projection_wkt and transform parameters here feels heavy, but I cannot get GDAL to work without them here.
I see that you are not always passing transform in for _features.pyx, and see no reference to setting projection. Are you able to get by without these if you are calling GDAL from Cython instead?
TODO:
Sharp-eyed people noticed this in my tech talk this morning.
.bounds would be the extent of the entire raster in its own units (minx, miny, maxx, maxy).
.ul(x, y) proposed name would return the upper left corner coordinates of any x,y pixel.
Sean,
First of all. Congrats! Second of all, I could have been using this library over a year ago! :) sigh You create wonderful Geo libraries for Python!
This Issue is mostly to remind myself (and you might be interested to) to contribute the clipping/masking functions of the CCAV Project for which I've been working on for the past year. See: https://bitbucket.org/ccaih/ccav and specifically https://bitbucket.org/ccaih/ccav/src/tip/src/ccav/utils.py
My intention is to contribute these functions to rasterio
in some way as I believe they are meaningful and useful to this library and quite "fitting".
cheers
James
Currently there is a very limited used of the six
module dependency.
In rasterio/__init__.py
:
from six import string_types
Looking at the source code, this function from the module can be replaced by:
import sys
if sys.version_info[0] >= 3:
string_types = str,
else:
string_types = basestring,
Or, an alternative solution is to define:
if sys.version_info[0] >= 3:
basestring = str
and replace instances of string_types
with basestring
.
Put a blurb in the interpreter's banner.
Mostly just a documentation issue. Pytest runs the existing tests just fine.
Pythons 2.7 and 3.3. Cython 0.20, Numpy 1.8. GDAL 1.9.2.
When one is interested only in foreground feature shapes, masking out the background saves time that would be used calculating background shapes.
Make sure it exists before handing it over to GDALOpen.
Via read-only crs_wkt attribute.
GDALRasterIO will resample as needed.
And surface them in the meta attribute.
Use cases: optimal display of NDVI and elevation. Should expose GDAL's own color ramp function but emphasize use of custom color tables -- mappings from pixel values to RGB (or RGBA) tuples -- and external tools for generating them.
Starting point: http://www.gdal.org/gdal_8h.html#ade5abc76e229097e6799ab58414aeaed.
Usage something like:
with rasterio.open(filename, 'w', ...) as dst:
dst.write_band(1, array)
dst.write_colormap(1, {0: (0, 0, 0), 1: (1, 1, 1), ...})
GDAL's GDALPolygonize function is more like a program's main() than a library function. It reads and writes. But it can support a Python function that has no I/O and that works with numpy rasters and Fiona collections. Something like this:
import rasterio
with rasterio.open(raster_filename, 'r') as src:
image = src.read_band(1) # read
features = rasterio.features.polygonize(image) # no I/O
with fiona.open(vector_filename, 'w', **meta) as dst:
dst.writerecords(features) # write
Rough design: an ndarray is made into a GDAL MEM format raster. Extracted features are written to an OGR Memory format dataset and a Python iterator over them is returned.
It's a better name.
Instead of registering drivers on a module import and de-registering them at exit, I'm going to implement a context manager that will handle this. The canonical way to use it is like this:
import rasterio
with rasterio.drivers():
# do stuff
but since a lot of users are going to expect to call rasterio.open() and have it just work, the rasterio reader and writer can also manager drivers in their own context if there is no global one. A global GDAL driver manager is detected by head count of registered drivers.
import rasterio
with rasterio.open(...) as f:
# read or write
The above can be slower than the canonical example because it registers/deregisters every time rasterio.open() is called.
Something like
sieved = rasterio.features.sieve(image, size=5)
GDALSieveFilter() wants bands, so I'll use in-memory rasters again.
Reference: http://www.gdal.org/gdal__alg_8h.html#a33309c0a316b223bd33ae5753cc7f616
I think rasterio should allow the user to get hold of NoData values.
If it can be done, it would be great to also have some way of assisting the user in testing for NoData values in NumPy arrays. This could be great because the most obvious way of doing the test (using simple equality) will fail to work on datasets using NaN as NoData value. And unfortunately NaN NoData values are quite common out there.
Rasterio counts its bands starting with 0. Even though this goes against image processing conventions (GDAL's in particular), Python indexes start with 0, and I thought it would be right to follow suit. But it looks like a future point of pain.
Recently, I read that Guido's preference for 0 is largely to make slicing neat and appealing to programmers: https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi. A rasterio file's bands aren't sliceable at all, and aren't exposed as a list of objects in any way. The only access to them is by read_band(idx) and write_band(idx, data) methods and those indexes (alone) could begin with 1.
The new pain point would appear here (we're just moving the pain around):
for i in range(src.count):
data = src.read_band(i)
Bugs would arise from attempts to access a non-existent band 0. To help, rasterio will need raster files to expose their band indexes as an attribute. Something like this:
for i in src.band_indexes:
data = src.read_band(i)
BTW, the reason why rasterio has no Band class is to avoid dangling bands with references to closed data sources as can happen in GDAL. GDAL gives programmers an interface that suggests they are operating directly on bands as they are stored on disk, but all rasterio operations are on image arrays in Python and then written to disk.
Hi @sgillies, I want to express interest in a read/write mode. I noticed that it's partially implemented. Is there something blocking this feature?
No change to license or anything. Any objection to this, @AsgerPetersen?
See http://www.gdal.org/classGDALMajorObject.html and http://www.gdal.org/classGDALMajorObject.html#a83e2cebbfa60a5e3f5cb4aa2e9b80d12.
Idea: have get_metadata(domain="foo") return a dict, have set_metadata(key, value, domain="foo") set one item. I want to avoid a rw metadata object for the same reason I'm avoiding rw band objects: dangling references.
In some cases it may be optimal to just read/write smaller parts of the band instead of the entire band.
See also #6
Since we can only 'w' GTiffs (see #4), we need function to convert temp GTiffs to other formats (using CreateCopy).
By lack of a mailing-list or anything i'm just going to write an idea here as an issue, i hope that's alright.
The issue (https://github.com/sgillies/rasterio/issues/6) by AsgerPeterson about iterating over the blocks provides a nice example of how to open and write block for block.
Would it be within the scope of RasterIO to extent this to support multiple input and output datasets with a similar interface? It introduces a lot of potential problems like projection differences, performance hits if the blocks aren't the same etc.
A really simple implementation i made a year ago demonstrates this idea, see for example:
http://nbviewer.ipython.org/gist/RutgerK/7941518
My implementation is full of assumptions of the datasets being identical in terms of properties, making it just a very simple wrapper to eliminate the gdal.Open()
overhead.
So to modify Asger's example a bit:
with rasterio.open('source1.tif',
'source2.tif') as src:
kwargs = src.meta
with rasterio.open(('destination1.tif', 'w', **kwargs),
('destination2.tif', 'w', **kwargs),
('destination3.tif', 'w', **kwargs)) as dst:
for window in src.get_chunks( 1 ):
data = src.read_band( 1, window)
dst['destination1'].write_band( 1, data['source1'] + data['source2'], window)
etc.
In some cases you could achieve a similar effect by combining inputs in a VRT
, but this doesn't work for outputs for example.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.