Giter Club home page Giter Club logo

spectral-recovery's Introduction

(pre-release) spectral-recovery: spectral recovery analysis of forested ecosystems

PyPI version Code style: black


Github: https://github.com/PEOPLE-ER/Spectral-Recovery/

Documentation: https://people-er.github.io/Spectral-Recovery/

PyPi: https://pypi.org/project/spectral-recovery/

Interactive Notebooks: Coming soon!


Overview

spectral-recovery is an open-source project and Python package that provides computationally simple, centralized, and reproducible methods for performing spectral recovery analysis of forests using satellite imagery.

The package provides straight-forward interfaces to help coordinate the many moving parts of spectral recovery analysis. Users provide restoration site locations, restoration dates, and annual composites of spectral data, while spectral-recovery handles computing the spectral indices, recovery targets, recovery metrics, and more!

The package aims, through both software and supplementary documentation, to make spectral recovery analysis more approachable, encouraging the use and adoption of well-founded remote sensing techniques to aid Ecosystem Restoration research and projects.

Installation

pip install spectral-recovery

Documentation

  • View background information, static tutorials, and API references in our project documentation.
  • Try out an interactive notebook (Coming soon!)

Contributing

  • Report bugs, suggest features, and see what others are saying on our GitHub Issues page.
  • Start discussions about the tool on our discussion page.
  • Want to contribute code? See our CONTRIBUTING document for more information.

How to Cite

When using this tool in your work we ask that you please cite the Spectral_Recovery tool as follows:

"Spectral Recovery method developed in the PEOPLE-ER Project, managed by Hatfield Consultants, and financed by the European Space Agency."

License

Copyright 2023 Hatfield Consultants LLP

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

spectral-recovery's People

Contributors

bnjam avatar kavlinclein avatar melissabirch avatar szwiep avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

spectral-recovery's Issues

ENH: Add windowed recovery target method

Add a recovery target method that uses a "windowed approach" i.e a moving window average. Will generate a per-pixel recovery target using a moving n-by-n window. Should implement two approaches for testing:

  1. Moving window with NaN border; border pixels in polygon will receive an average of all non-NaN pixels in the window.
  2. Moving window before NaN border; border pixels in polygon will receive an average, then NaN border will be added.

Requested by @melissabirch. Windowed approach has been used in literature.

ENH: Allow historic targets to be per-pixel OR per-polygon

Issue

Current implementation of the tool only allows historic recovery targets to be computed on a per-pixel basis. Allowing users the choice between per-pixel and per-polygon provides additional flexibility and potential use-cases for users. Feature suggested by @melissabirch.

Proposed Feature

Provide parameter to allow users to determine their target method. Either a boolean or str input. This will likely be easier to develop after the API update that will be introduced in #34.

`read_and_stack_tifs` throws error for valid but unsupported band common names

Issue

The spectral_recovery.io.read_and_stack_tifs method will throw an error for common names that are valid but unknown/unused to the tool, e.g "Red edge" or "Ultra blue". This restriction forces users to remove valid bands from their images before using the tool.

Proposed Solution

Two possible solutions come to mind.

  1. Ignore and remove the unknown bands during the call to spectral_recovery.io.read_and_stack_tifs
  2. Expand support for common names and read in all valid bands without error. This would require adding the full set of common names found in Landsat TM, Landsat ETM, Landsat OLI, and Sentinel 2 to the spectral_recovery.enums.BandCommon enum structure.

I'm leaning towards option 2 because it doesn't seem like spectral_recovery.io.read_and_stack_tifs should have the responsibility of rejecting bands that exist but are not required for any of the indices currently implemented in our tool. It we support the full suite of bands for each platform, then can read in without error and then drop unrequired bands after indices have been computed. Additionally, adding expansion of band names now could help us down the road for inclusion of more advanced/newer indices.

Allow less restrictive file naming

Issue

When passing images to spectral_recovery.io.read_and_stack_tifs, each image must be named after the year the image was captured/derived, e.g "2022.tif", "2023.tif". This is true whether the user passes a list or a directory to the path_to_tifs= parameter. If a user's image names do not match the convention they are forced to rename all of their images before using the tool.

Context

This convention was used because there needed to be a way to definitively map an image to it's year/time coordinate in the DataArray structure that read_and_stack_tifs produces. See the involved code below:

image_dict = {}
if isinstance(path_to_tifs, str):
    # check if path is a directory
    if Path(path_to_tifs).is_dir():
        path_to_tifs = list(Path(path_to_tifs).glob("*.tif"))
for file in path_to_tifs:
    with rioxarray.open_rasterio(Path(file), chunks="auto") as data:
        image_dict[Path(file).stem] = data

time_keys = []
for filename in image_dict.keys():
    if _str_is_year(filename):
        time_keys.append(pd.to_datetime(filename))
    else:
        raise ValueError(
            f"TIF filenames must be in format 'YYYY' but received: '{filename}'"
        ) from None

stacked_data = _stack_bands(image_dict.values(), time_keys, dim_name="time")

The current workflow of mapping files to time, generally is:

  1. Find all TIF files in a directory or read all files passed in the list
  2. Parse years from filenames
  3. For each image, convert year filenames to datetime objects and assign as image's time coordinate

Proposed Solution

Some more thought/discussion is needed on how/if this naming convention should be loosened.

DOC: Documentation Update

Overall Updates:

-Landing Page ReadMe:
-Update Workflow Diagram

  • Update API Reference
  • User Guide:
    • Rename -> Theoretical Basis
    • Split into three documents per section
    • Ensure subsections are up to date
    • Update Section 2:
      • Tool Workflow Diagram
      • Add reference to spyndex reword our index list to be 'commonly used indexes'
      • Put a reminder or something to update the Restoration Polygon once the multi-polygon task has been finalized

Tutorial Updates:

  • Rename tab to Tutorials
  • 2 Subsections:
    • For Users:
      • Getting Started:
        • Reworked Problem Statement
        • Installation
        • Quick Overview
      • User Guide:
        • Terminology
        • Reading in Data
        • Computing Indeces (Reference to Reading in Data, about bands and stuff):
          • Make sure to include a paragraph on scaling data (mention that some metrics expect data to be between 0 and 1 and therefore if some data is not scaled it might lead to unexpected outputs) (YrYrr, RRI, dNBR).
        • Recovery Targets:
          • Recovery Targets (Historic vs Reference)
          • Historic recovery methods (Windowed, per-pixel, etc.)
        • Plotting
        • Recovery Metrics:
          • General information about recovery metrics
          • Interpretation of the metrics.
        • Computing with Numpy vs Dask
    • For Developers (WIP):
      • Contributing
      • Restoration Area description
      • System architecture diagram
      • Evolution Roadmap

Documentation Structure (WIP):

API: Make Dask computation default

Issue

After introducing the "array_type" parameter in #33, the default was set to "numpy" but given that EO datasets are typically larger, a more reasonable default would be "array_type=dask".

Add documentation for writing TIFS

Issue

The project documentation does not contain any information on how to write the recovery metric results (xarray.DataArray) to TIF files. This forces users to learn themselves or have existing knowledge on how to write DataArrays to raster format files.

Proposed Solution

  • As a quick fix, will add a cell in the tutorial showing how to write metric results using rioxarray, which is the way the UBC team has been writing recovery metric results to TIF files.
  • Open to more discussion surrounding addition documentation and/or different documentation types/formats for explaining "writing your results"; there are plans to add tutorials on interpretation of results... so maybe writing could fit somewhere in there?

ENH: Recovery Target API Updates

Opening this issue for API changes for recovery target methods. Hoping to get some input before a development starts.

Current

Currently, users import a class from the targets modules, parameterize it and then pass that object to compute_metrics

from spectral_recovery.targets import MedianTarget, WindowedTarget

mpixel = MedianTarget(scale="pixel")
mpoly = MedianTarget(scale="polygon")
win = WindowedTarget(N=3, rm_na=False)

metrics = compute_metrics(
     ....
     recovery_target_method=mpixel # or mpoly or win
)

Then, inside of RestorationArea , the compute_recovery_targets (see #84) function is called, which clips an appropriate reference stack and then passes that to the recovery_target_method provided.

Known downsides of this implementation?

  • Impossible for users to pass custom methods.
  • Does not allow users to investigate the recovery target.
  • timeseries DataArray input must contain both reference and restoration polygons.

Positives?

  • Users don't have to compute recovery targets themselves.

Proposed

Have users compute the targets using provided (or custom) functions and then pass the targets to compute_metrics function. The shape of the recovery_targets xr.DataArray would need to be broadcastable to timeseries clipped to the restoration polygon. e.g:

from spectral_recovery.targets import median_target, windowed_target

mpixel_array = median_target(timeseries, polygons, scale="pixel")
mpoly_array = median_target(timeseries, polygons, scale="polygon")
win_array = windowed_target(timeseries, polygons, N=3, rm_na=False)
custom_array = some_custom_func(timeseries, polygons, ...)

metrics = compute_metrics(
     ....
     recovery_targets=mpixel_array # or mpoly_array, win_array, or custom_array
)

This would 1) allow users to use custom target functions, 2) provide users with direct access to the recovery target values, and 3) allow users to read in a smaller array just over the reference polygons (if different than restoration polygons).

Potential downsides?

  • More complicated. Requires users to compute targets rather than RestorationArea coordinating the computation.
  • Opens up room for error? If timeseries given to the targets and the timeseries given to compute_metrics are different and/or from different data sources, the recovery target would be invalid.

Potential variations:

  • Still allow for a default recovery target computation. This would force users to make sure their input timeseries contains both reference and restoration polygons, if recovery_targets is not None.

Please share your thoughts! @melissabirch @KavlinClein @wdkang

ENH: Split Restoration and Reference image inputs

Issue

Requiring that the restoration polygons and reference polygons share the same image stack means that both sets of polygons must be contained within the boundary of the images. This can sometimes lead to users creating/processing/storing images that are much larger than necessary e.g a restoration polygon in the far left corner of an AOI and a reference polygon X km away in the bottom right corner. This could be avoided if users could pass separate image stacks for reference and restoration polygons.

Proposed Feature

Split the image_stack= parameter in RestorationArea into rest_image_stack= and ref_image_stack=, and have each object processing targets/metrics from the respective image stack.

Note that this would not be a back compatible changeAND could make the parameters slightly more confusing for users, so before merging, it should be demonstrable that this brings actual improvements to speed and that users can still use it.

ENH: Distinguish "not recovered" output from Y2R

Issue

Y2R currently returns NaN for pixels that have not met the recovery target for any year in the recovery window. This is not ideal for two reasons, 1) it is indistinguishable from the NaNs that mark the polygon boundaries (see: rioxarray.clip), and 2) because the NaNs are indistinguishable, it is complicated to filter out any "not recovered" pixels from the result.

Proposed Feature

Y2R should return -9999, or something similar, for any pixel that has not yet recovered. -9999 is a good candidate because it is distinct from NaN, it is not positive so cannot be mistake for a long recovery time, and is not close to 0 so cannot be mistaken to be along the same linear range as the other outputs (e.g -1).

BUG: Incorrect NaN replacement in Y2R

New implementation of Y2R (setting unrecovered pixels to -9999 instead of NaN) is incorrectly setting pixels that started as NaN but had values later in the timeseries to NaN.

DOC: Add Metric Interpretation section to Overview

The problem
The tool Documentation does a good job of describing the indexes and metrics that the tool uses. However, what is missing is a section helping the user interpret the values returned by different recovery metrics.

Describe the solution you'd like
A short paragraph per recovery metric explaining how to interpret the different recovery metrics.
Something like:

"Interpretation of R80P values is similar to the RRI, with a value of zero indicating that spectral forest recovery did not occur, while a value of one indicates that NBR values have recovered to be equivalent to 80% of the pre-disturbance values. Values greater than one indicate that spectral forest recovery has exceeded that of the 80% of pre-disturbance threshold" (Frazier et al., 2018).

BUG: Allow Restoration Polygon with Non-Zero Index

Issue

If a user passes in a perfectly valid restoration polygon GeoDataframe to compute_metrics but the index value of the row is not 0, a KeyError is thrown. This is because in RestorationArea._get_dates_from_frame, the second index is hardcoded as 0:

disturbance_start = rest_frame.iloc[:, 0][0]

Proposed Solution

There is no reason to force the index of the GeoDataFrame row to be 0. RestorationArea._get_dates_from_frame should dynamically find the index value of the row and then use that to index into the GeoDataFrame.

Bug reported by Chen on 2024/03/01.

Unexpected Y2R values

In some cases, RestorationArea.yr2() is producing unexpected results. e.g saying 1 year for recovery when it should be 0. This behavior is not consistent for all inputs, and the current set of units tests are all passing, so something weird is going on. A more detailed description with examples of the behaviour will be added soon.

Lint Codebase

Code should be linted using pylint before 0.2.0 release. This will help address inconsistencies in coding standard and finding errors, making future development (and incorporation of contributions from users) easier.

ENH: Allow users to pass date values when reading in polygons

After discussion with VTT (@renneter), the new API could be more flexible by allowing users to pass a dictionary of dates (disturbance, restoration, reference) when reading in polygons. This would allow users to use the tool without forcing them to edit polygon attributes via GIS software. Would also be more lenient about pre-existing fields because the function would enforce order of date cols and just ignore extra cols/push them to the side.

rest_poly = sr.read_restoration_polygons(
   "polygons.gpkg",
    disturbance_start="2001",
    restoration_start="2002",
    reference_start="1999", 
    reference_end="2000",
). 

when supporting multiple polygons (#34), this can be refactored to a dictionary of polygon fids to a list of dates,

rest_poly = sr.read_restoration_polygons(
   "polygons.gpkg",
    dates={0: ["2001", "2002", "1999", "2000"]}
). 

BUG: Wrapper err if keyword args in unexpected order

The maintain_rio_attrs wrapper assumes that the first argument of a function is an xarray.DataArray with rioxarray attributes. So if/when a user passes in their keyword parameters in a different order, an attribute error is thrown (X does not have attribute 'rio'). We should not be forcing order on keyword args.

ENH: Allow skip of index computation

Issue

Users might have access to composites with indices already computed. The tool does not accept these inputs but should, as to avoid forcing users to re-compute composites in order to use the tool.

Proposed feature

Allow band names to be indices in read_timeseries. Also allow indices in compute_metrics to be None. If None, skip the computation. If not None, check whether the band names on the DataArray are the requested indices, if so, also skip index computation.

DOC: Misc typos

In the Polygons section:

  • Text line: TYPO - should be "Read your restoration polygon"
  • Restoration Start & Disturbance Start cells
    -Disturbance and reference descriptions are flipped.

Originally posted by @KavlinClein in #42 (comment)

Resolve typos in decorator name.

Originally posted by @KavlinClein in #37 (comment)

ENH: Remove Platforms Parameter

Issue

The platforms= parameter in the read_and_stack_tifs method is not required. The information of which platforms the images are derived from is not pertinent, since the standardized names of the bands appropriately distinguishes data.

Proposed Feature

Remove platforms= parameter from read_and_stack_tifs. The new call signature would become:

read_and_stack_tifs(path_to_tifs: List[str] | str, band_names: Dict[int, str | BandCommon | Index] = None, array_type: str = "numpy")

Update Contributing.md

Describe the bug
The steps the document suggests assume the contributor is working in an environment that already has the optional dependencies installed. If one creates a virtual environment, does an editable install and attempts to run the tests it will not work as something like pytest will not be installed on the virtual environment,

To Reproduce
Steps to reproduce the behavior:

  1. Switch to a dev branch
  2. Create a virtual environment
  3. Activate the venv
  4. Pull the latest updates from that branch
  5. Perform an editable install
  6. Attempt to run pytest src/
  7. See error

ENH: Speed up trajectory plotting

RestorationArea.plot_spectral_trajectories is very slow in comparison to the rest of the tool. Speeding up the plotting method could be a good improvement to take on.

Allow band numbers, not just common names

Issue

The spectral_recovery.io.read_and_stack_tifs method only allows bands to be named by their common names, e.g 'SWIR', 'NIR', 'BLUE' and does not allow names like 'B4', 'B4', 'B1' which are also commonly used. The tool will currently throw an error is anything other than common names are passed. This potentially forces users to rename all their image bands, a frustration if they have many images in their timeseries. For ease-of-use, read_and_stack_tifs method should be able to support a less restrictive set of band names.

Context

The band names were originally restricted to just common names because the tool is designed to be platform independent. Using only common names allowed the tool to not concern itself with band numbers differences between platforms, e.g Landsat ETM B4 = "NIR", Sentinel 2 B4 = "Red" and only compute indices using common names.

Proposed Solution

With incoming changes on the dev branch, the spectral_recovery.io.read_and_stack_tifs method will be using a platform= parameter. This parameter identifies the platform from which the user's images are derived.

In spectral_recovery.io.read_and_stack_tifs, we could use a mapping from band numbers ('B1') to common names, using the platform parameter to accurately map platform-dependent band numbers to common names. This would maintain the use of common names internally but would allow for a more flexible dataset from users.

A bit more thought is needed about how and where to implement the mapping structures. My first thought is to put them in spectral_recovery.utils and import into io from there.

ENH: Support more than one restoration polygon

Feature Description

The tool should support the processing of more than one polygon at a time. Users can currently iterate over a set of polygons, building a new RestorationArea object for each polygon, but this does not take advantage of the multi-dimensional broadcasting built into xarray.

Supporting multiple polygons was pushed off because each polygon needs to have a disturbance and restoration start date set, which complicated the potential solution at the start of the project. With more things in place now in the tool now, it should be possible to figure out a way to allow multiple polygons.

Potential Solution

TDB.

MAINT: Refactor after API Overhaul

Issue + Proposed Changes

Some logic got left in places that would be more appropriate elsewhere after the new API in #72.

  • Plotting logic should be moved from RestorationArea into plotting module.
  • Move compute_metrics to a more appropriate module. Maybe metrics?
  • RestorationArea should only be responsible for coordinating the restoration/reference sites and the timeseries data
    • Loose validation logic in restoration should be moved into RestorationArea
    • Possibly out of scope: metric methods should be removed from RestorationArea, compute_metrics should use the functions imported from metrics. Maybe refactor metrics calls to accept a RestorationArea object?

DOC: Add param details to guide

  • Might be worth explaining how to use certain parameters when working with a reference method or a historical method? Like two small example cells showing how you might write the code for each use case? Or maybe just adding which parameters are optional could be helpful?

Originally posted by @KavlinClein in #42 (comment)

ENH: Add default index constants

Issue

After updating the index computation to use Spyndex in #64, users are required to provide constant values for indices (e.g L=0.5 for SAVI) that use constants. Our previous previous implementation used hard-coded befaults in formulas but did not allow users to change the constant values. We want to keep the flexibility of choice but have defaults in place for users who are happy to use default values.

Proposed feature

Add default values for supported indices. Use those if user does not provide constant values to plot_spectral_trajectory or compute_metrics. Important: check that the default values we use are not platform dependant.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.