mheberger / delineator Goto Github PK

Fast, accurate watershed delineation using hybrid vector- and raster-based methods and data from MERIT-Hydro

License: MIT License

Python 86.58% HTML 13.42%

delineator's Introduction

delineator.py

Python scripts for fast, accurate watershed delineation for any point on the Earth's land surface, using a hybrid of vector- and raster-based methods and the datasets MERIT-Hydro and MERIT-Basins.

Citation DOI: 10.5281/zenodo.7314287

Online demo: https://mghydro.com/watersheds/

(The web app is free and easy to use, and probably good enough for most users.)

Using these scripts

This repository includes sample data covering Iceland. To delineate watersheds in other locations, you will need to download datasets from MERIT-Hydro and MERIT-Basins. Instructions on how to get the data and run the script are provided below.

To get started, download the latest release from this GitHub repository. If you're a GitHub user, you can fork the repository.

These scripts were developed with Python version 3.9, and I later tested them with versions 3.10 and 3.11.

I recommend creating a Python virtual environment in which to run delineator.py. Here is a good introduction to virtual environments -- why they are useful, and how to use them. You can create and activate the virtual environment, then install all the required packages with the following commands.

Open the Terminal (Linux and Mac) or Command Prompt (Windows), cd to the directory containing delineator.py and related files, then enter these commands.

To create the virtual environment:

python -m venv venv

To activate the virtual environment:

# Windows Command Prompt or PowerShell
venv\Scripts\activate.bat

# Windows PowerShell
venv\Scripts\Activate.ps1

#Linux and MacOS venv activation
$ source venv/bin/activate

Next, install required packages:

$ pip install -r requirements.txt

This script uses the latest versions of Python packages circa November 2023, with one exception. I had to use an older version of GeoPandas (0.13.2 instead of 0.14), because of this: Issue #3054.

It seems that many of the libraries I'm using are under very active development, and upgrading packages usually breaks the code, requiring lots of careful debugging.

After this, follow the instructions below for how to get the input data, configure the settings, and run delineator.py.

Overview of using delineator.py

The major steps are the following, with more detailed instructions below.

Download basin-scale MERIT-Hydro raster data (mghydro.com)
Download MERIT-Basins vector data (reachhydro.com)
Download simplified MERIT-Basins data (mghydro.com)
Create a CSV file with your desired watershed outlet points
Edit settings in config.py
Run delineator.py to delineate watersheds
Review output
Run again to fix mistakes (repeat steps 4 – 7)

Before you begin downloading data in steps 1 and 2, determine which files you need based on your region of interest. The data files are organized into continental-scale river basins, or Pfafstetter Level 2 basins. There are 61 of these basins in total. Basins are identified by a 2-digit code, with values from 11 to 91.

Detailed Instructions

1. Download MERIT-Hydro raster data

To delineate watersheds in the default "high-precision mode," you will need two sets of MERIT-Hydro gridded (or raster) data: flow accumulation and flow direction. The authors of the MERIT-Hydro created 5-degree tiles, but this script needs seamless layers that cover entire river basins. Download these data here: https://mghydro.com/watersheds/rasters

Update config.py to tell the script where to find these data files. Modify the variables:

MERIT_FDIR_DIR (for flow direction)
MERIT_ACCUM_DIR (for flow accumulation)

2. Download MERIT-Basins vector data

Download the shapefiles for unit catchments and rivers. Follow the instructions here: https://www.reachhydro.org/home/params/merit-basins

In the folder pfaf_level_02 , download two sets of files:

unit catchment shapefiles: cat_pfaf_##_MERIT_Hydro_v07_Basins_v01.shp
river flowline shapefiles: riv_pfaf_##_MERIT_Hydro_v07_Basins_v01.shp

In these files, ## is the Pfafstetter Level 2 basin code. See the figure above to determine which of the 61 level 2 basins you need, depending on your region of interest.

Unzip these files and save them to a folder on your hard drive. Then, in config.py, update the variables HIGHRES_CATCHMENTS_DIR and RIVERS_DIR.

3. Download simplified MERIT-Basins data

In "low-resolution" mode, the script will look for shapefiles with simplified unit catchment boundaries. I have created these files, and you can download them here:

https://mghydro.com/watersheds/share/catchments_simplified.zip

Unzip and save these files to a folder on your computer. In config.py, update the variable LOWRES_CATCHMENTS_DIR.

4. Create a CSV file with your desired watershed outlet points

The script reads information about your desired watershed outlet points from a plain-text comma-delimited (CSV) file. Edit this file carefully, as the script will not run if this file is not formatted correctly.

The CSV file must contain three required fields or columns, and another two optional fields that are useful when reviewing the results.

id - required: a unique identifier for your watershed or outlet point, an alphanumeric string. May be any length, but shorter is better. The script uses the id as the filename for output, so avoid using any forbidden characters. On Linux, do not use the forward slash /. On Windows, the list of forbidden characters is slightly longer (\< \> : " / \ | ? \*).
lat - required: latitude in decimal degrees of the watershed outlet. Avoid using a whole number without a decimal in the first row. For example, use 23.0 instead of 23.
lng - required: longitude in decimal degrees
name - optional: a name for the watershed outlet point. The text in this field should be surrounded by straight quotes. For example, "Elwha River at McDonald Bridge near Port Angeles, WA"
area - optional: if you have an a priori estimate of the watershed area, provide it here in square kilometers, km².

All latitude and longitude coordinates should be in decimal degrees (EPSG: 4326, https://spatialreference.org/ref/epsg/wgs-84/).

5. Update `config.py`

Read through the options and set the variables as you wish. Detailed instructions are below.

6. Run `delineator.py` to delineate watersheds

Once you have downloaded the datasets listed above, and updated config.py, you are ready to delineate watersheds. Run delineate.py from your favorite Python IDE or from the command line:

>> python delineate.py

You will get a couple of warnings about an older version of the library Shapely. I haven't had time to migrate to Shapely 0.20, which will require some reprogramming.

7. Review results

The script can output watersheds in several different geodata formats, as long as it is supported by GeoPandas. Shapefiles are popular, but I recommend GeoPackage, as it is a more modern and open format. To get a full list of available formats, follow the directions here.

The script also can create web page to review your results on an interactive map. In config.py, set MAKE_MAP = True, and enter your desired file path in MAP_FOLDER. The script will create .js files for each watershed with the geodata in GeoJSON format readable by the web browser.

To view your watersheds, open the file _viewer.html in a web browser and click on a watershed ID. The table is sortable and searchable.

In the example dataset I included, the results for "Blanda River at Langamyri" appear to be totally wrong. I think there is a typo in the coordinates. The remainder of the results look quite reasonable.

8. Run again to fix mistakes

Automated watershed delineation often makes mistakes. The good news is that these errors can often be fixed by slightly moving the location of your watershed outlet.

Repeat steps 4 to 7 by creating a new CSV file, or modifying your existing file, using revised coordinates. The script will automatically overwrite existing files, so make sure to back up anything you want to save.

To quickly test different outlet locations, consider using the free Global Watersheds web app, which uses the same data and methods, but is often much faster: https://mghydro.com/watersheds/

Configuration Notes

Here are some more details on the variables in the file config.py.

Watershed Outlets CSV file

This GitHub repository contains example files to delineate several watersheds in Iceland. To delineate watersheds in other regions, follow the instructions above to download the input data files.

(The locations in outlets_sample.csv correspond to flow measurement gages in the GRDC database. I chose Iceland because the data files for this region are small. In addition, Iceland has some astonishingly beautiful rivers!)

You can create your own CSV file and name it anything you like. In config.py, enter the file name in the variable OUTLETS_CSV. Your CSV file should have a header row with these fields written exactly like this:

id, lat, lng, name, area

It does not matter whether you include a space after each comma. The fields may be in any order.

The first three fields (id, lat, lng) are required. The fields name and area are optional but are useful if you have them. You can include extra fields if you like.

Low-resolution vs. high-resolution mode

If you are going to be creating large watersheds, say over 50,000 km², the script will take more time to run. There are three steps that tend to be slow: (1) loading large datasets from shapefiles, (2) building the spatial index, and (3) merging unit catchments with a Unary Union operation using the GEOS library. On my laptop, it took about 30 minutes to delineate the Amazon watershed (admittedly the largest watershed on earth).

For large watersheds, it is usually better to use low-precision mode, which runs much more quickly. There will be a slight loss in precision, which is barely noticeable in large watersheds.

In config.py, if you set LOW_RES_THRESHOLD = 50000, then all watersheds with an area greater than 50,000 km² will automatically use low-resolution mode.

You may also turn off "high resolution" mode completely. In config.py, set HIGH_RES = False.

If you only use low-resolution mode, you do not have to download the high-resolution raster data described in Step 1 above, nor do you need the high-resolution MERIT-Basins vector data from Step 2.

Low-resolution mode has two main differences. First, the program will use simplified unit catchment boundaries. Because these polygons have fewer vertices, processing them is faster. Second, the script will not perform the detailed raster-based calculations near the outlet to "split" the most downstream unit catchment. As a result, your watershed will contain some extra area downstream of your requested outlet.

To faster delineation of large watersheds, consider using the online demo version at https://mghydro.com/watersheds

The online version has a few optimizations that make it run a lot faster. First, it loads the geodata from a PostgreSQL database, rather than reading shapefiles from disk. Second, it uses PostGIS, which is known for its fast and efficient processing of vector geodata. Finally, it makes use of caching and memoization to save results from previous users.

Search Distance

Sometimes, your watershed outlet point will not fall inside one of the MERIT-Basins unit catchments. Your point could be just offshore in the ocean or an estuary, or it may happen to fall into one of the many small gaps and slivers in the source dataset.

The variable SEARCH_DIST controls how far away the script will look for the nearest catchment (in decimal degrees). Setting SEARCH_DIST = 0 means that the point MUST fall inside a unit catchment to return results. I recommend setting it to at least 0.005 for reliable results, especially if your watersheds are near the coast.

Filling holes in watershed polygons

In config.py, set FILL = True to fill in small gaps or "donut holes" in the watershed.

Oftentimes, watersheds created by computer will contain internal gaps in the watershed polygon. Many of these come from the source data. MERIT-Basins has many small gaps and slivers between unit catchments, often only a few pixels wide. I recommend allowing the program to fill these in. To do this, in config.py, set FILL = True, and enter a non-zero value for FILL_THRESHOLD.

However, there are also larger donut holes inside a watershed, representing internal sinks, out of which water cannot flow. How to handle these holes is somewhat of an outstanding question in hydrology, but my view is that most of them are unwanted, especially the smaller ones.

Larger gaps may be important, and you may not want to merge these with your watershed if they do not contribute to surface flow. As an example, here is a map of the Rio Grande watershed, with outlet coordinates at 26.05, -97.2.

Between the two main branches, the Rio Grande in the west and the Pecos River in the east, there is an endorheic basin that runs north-south for 560 km from New Mexico to Texas. Within this basin, there are several alkaline lakes or "playas," a tell-tale sign that water flows in and either seeps into the ground or evaporates.

For most applications, it will be important to recognize that this area is "disconnected" and does not contribute to surface water flow at the basin outlet. On the other hand, if you are studying groundwater recharge, these areas may be important.

Fill threshold

The script allows you to fill in small holes and keep big ones. If you have set FILL to True, the variable FILL_THRESHOLD controls what size holes get filled in.

The size threshold is roughly the number of pixels on a 3 arcsecond grid. In the source data, a pixels is a 0.000833° square. This is about 90m x 90m near the equator, or about 0.0081 km². The pixels get smaller in terms of surface area as you move north or south away from the equator.

For example, setting FILL_THRESHOLD = 10 will fill in any donut holes with a width of 10 pixels or less and leave larger holes intact.

Setting FILL_THRESHOLD = 0 will fill all donut holes, no matter their size (the same as setting it to a large number).

Simplify Output

The output from the script may contain more detail than you need. To remove some vertices from the watershed boundary, set SIMPLIFY = True.

You will also need to set SIMPLIFY_TOLERANCE to a value in decimal degrees. This corresponds to the distance parameter in the Douglas-Peucker algorithm.

Note that the vector polygons in the input data have been digitized from pixels with an edge length of 0.000833°. Setting SIMPLIFY_TOLERANCE to about half of this size or higher will remove the jagged "staircase" appearance of the watershed boundary.

You might also try setting SIMPLIFY = False and using a different method. I highly recommend mapshaper.org.

Match Area

In config.py, if you set MATCH_AREAS = True, the script will not necessarily snap your outlet point to the closest river reach. Rather, it will search the neighborhood around the outlet point until it finds a river reach whose reported upstream area is a close match to your estimated watershed area.

This feature can be useful when you are looking for large watersheds, but the script is outputting small watersheds. Sometimes, the script will find the watershed for a small tributary stream instead of the large river you are looking for.

Note: this feature is experimental and can produce weird and unexpected results. It does not work well for small watersheds, say below 1,000 km².

Also, you can only use this feature where you have provided an estimated upstream watershed area in your outlets CSV file.

If you have set MATCH_AREAS = True, you also need to provide a value for AREA_MATCHING_THRESHOLD. This value is a percentage. For example, enter 0.25 to specify that the upstream area of a river reach should be within 25% of your estimated area to be considered a match.

Pickle Files

Optional. One of the slow steps in the script is reading shapefiles and creating a GeoDataFrame. Once you have done this once, you can save time in the future by storing the GeoDataFrame as a .pkl file. Enter a file path for the constant PICKLE_DIR, and the script will automatically save files here. The script will not search for pickle files if you leave this as a blank string, '' Note that these files are not any smaller than the original shapefile, so they don't save disk space; they are just much faster to load.

Contributing

If you have any comments or suggestions, please get in touch. If you find a bug, you can report an issue here on GitHub. Finally, this code is open source, so if you are motivated to make any modifications, additions, or bug fixes, you can fork the repository and then do a pull request on GitHub.

delineator's People

Contributors

Stargazers

Watchers

Forkers

joshseeley marcellobenigno athibault92 senkukogetsu yimsun97 macanalestorres gurbuzf fabric-io-rodrigues mhle510 angesom12 prakrutkansara dpasang alexdo1 tomgermann ssd-77 ischicker nbiswasuw forestbat agredo182

delineator's Issues

Delineation error showing series object has no attribute uparea

I am able to run the sample data file. However, when trying to run for Saudi Arabia (Catchment number 29) got the following error.

Beginning delineation for 1 outlet point(s) in Level 2 Basin #29.
Fetching BASIN # 29 catchment data from pickle file.
Reading data table for rivers in basin 29
Fetching BASIN # 29 catchment data from pickle file.
Performing spatial join on 1 outlet points in basin #29

Delineating watershed 1 of 1, with outlet id = 6401070
Traceback (most recent call last):
File "c:\Users\ssdas\Downloads\delineator-main\delineator-main\delineate.py", line 730, in
delineate()
File "c:\Users\ssdas\Downloads\delineator-main\delineator-main\delineate.py", line 441, in delineate
up_area = rivers_gdf.loc[terminal_comid].uparea
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ssdas\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\generic.py", line 5989, in getattr
return object.getattribute(self, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'uparea'

Issue running get_subdivided_merit_polygon function

When I run the delineator script, I come across this error:
AttributeError: 'MultiPolygon' object has no attribute 'exterior'
It stems from the line calling the get_subdivided_merit_polygon function in delineator
poly, lat_snap, lng_snap = py.merit_detailed.get_subdivided_merit_polygon(wid, basin, lat, lng, catchment_poly, bSingleCatchment)

and subsequently in this line in the merit_detailed script:
filled_poly = Polygon(poly.exterior.coords)

looks like my catchment_poly is a Multipolygon, so converting it to a polygon and then back to multipolygon doesn't work, but if I skip those steps, it still doesn't work

Error batch processing points

I'm running the delineator tool on over 300 points in my basin csv, and I get this error after 228 watersheds have been delineated:
IndexError: index 0 is out of bounds for axis 0 with size 0

Full snippet of error:

All the data I'm using is as provided in the README except my basins csv:
basins.csv

Coastal Points / Points outside basins

I'm trying to run the delineation on a batch of points, and some of them are outside the merit basins. I tried using the buffered merit basins but I run into a nan error cause it's outside of the catchment basins, so there's no terminal_comid for it. Is there a fix for this? What would you recommend I do so they are delineated?

Some of such points are:
{'id': 217697, 'lat': 49.348897, 'lng': -124.428403}
{'id': 217808, 'lat': 49.23117, 'lng': -123.97092}
{'id': 217714, 'lat': 50.6102, 'lng': -127.2367}

A map to show ecoregions (supersets of watersheds — with common plants, animals, climactic conditions, and human settlement patterns)?

An ecoregion is "an area with characteristic flora, fauna, and climatic conditions, and related human settlement patterns, and can be comprised of several different watersheds, if the plants and animals are similar." (More on watersheds, ecoregions, and bioregions).

This is useful to organizing and planning in ways that are more ecologically focused (as opposed to relying on administrative demarcations). Also, from the experience of the Cascadia Bioregion in the PNW, many people are interested in seeing their local ecoregions — and don't quite know where to begin or find it, as this may require more technical mapmaking and natural science experience.

Given that ecoregions are a superset of watersheds, and @mheberger's excellent watershed tool , would an ecoregional map project interest you? This could take many different forms. It could simply be an image of a map/shapefiles, which illustrates ecoregions around the world. That shapefile likewise be a layer in online interactive maps, including a layer in the Global Watershed map. It could likewise be an interactive map, which allows you to traverse different scales — from local watersheds, to wider ecoregions, to widest bioregions ("bioregions: the full extent of watersheds within an interconnected area. Bioregional borders tend to be jagged, and hard, such as mountain ranges, peaks, ridges, volcanoes, continental uplifts, tectonic plates and faults, defining how energy flows within a set of boundaries.")

As I write this, it does indeed seem that analytical work to identify ecoregions and bioregions could come first — that analysis could be rendered into shapefiles, which could then easily be integrated into interactive online maps etc.

At the same time, perhaps there could be online tools and data to help bootstrap and assist with that analysis. For example, taking into account the different characteristics relevant to ecoregions:

Watersheds: (https://mghydro.com/watersheds/)
Flora: the PlantNet API and AI GeoSpecies Service, which predict local flora based on a given location.
Fauna: (not sure about tools for this)
Climactic Conditions, Geology, Soil composition, Rainfall patterns: There are several online datasets and APIs for this data
Human Settlement Patterns: Perhaps census-related APIs, like the American Community Survey and other US Census APIs? Human cultures and ways of living may also play a role here, which could require a more anthropological/sociological touch.

Is a project/collaboration like this something that would interest you?

Catchment file selection (Question)

How different are the standard catchment files to the ones marked 'minor bug fix for coastline pixels'. I tried batch delineating a couple of points using the standard catchment shapefiles, and for a point in basin 78, there seems to be a missing feature / gap in the polygon. I noticed that this is not the case in the bug-fixed file and was wondering which one was better to use in general

Catchment shapefile: no CRS definied

Hi Matthew!

I currently try to delineate catchments in Germany (Basin 23).
I followed your instructions from the front page to download the data for Germany and set up everything in config.py.
When I run python delineate.py in the terminal, I get the following output and error:

$ python delineate.py 
/home/alexander/Github/camels/camelsp/merit_hydro/delineator/venv/lib/python3.9/site-packages/geopandas/_compat.py:123: UserWarning: The Shapely GEOS version (3.10.3-CAPI-1.16.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.4-CAPI-1.16.2). Conversions between both will be slow.
  warnings.warn(
/home/alexander/Github/camels/camelsp/merit_hydro/delineator/venv/lib/python3.9/site-packages/geopandas/_compat.py:123: UserWarning: The Shapely GEOS version (3.10.3-CAPI-1.16.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.4-CAPI-1.16.2). Conversions between both will be slow.
  warnings.warn(
Reading your outlets data in: /home/alexander/Github/camels/camelsp/merit_hydro/data/outlets.csv
Finding out which Level 2 megabasin(s) your points are in
Your watershed outlets are in 1 basin(s)

Beginning delineation for 10 outlet point(s) in Level 2 Basin #23.
Reading catchment geodata in /home/alexander/Github/camels/camelsp/merit_hydro/data/shp/merit_catchments/cat_pfaf_23_MERIT_Hydro_v07_Basins_v01.shp
Traceback (most recent call last):
  File "/home/alexander/Github/camels/camelsp/merit_hydro/delineator/delineate.py", line 701, in <module>
    delineate()
  File "/home/alexander/Github/camels/camelsp/merit_hydro/delineator/delineate.py", line 449, in delineate
    catchments_gdf.to_crs(crs, inplace=True)
  File "/home/alexander/Github/camels/camelsp/merit_hydro/delineator/venv/lib/python3.9/site-packages/geopandas/geodataframe.py", line 1364, in to_crs
    geom = df.geometry.to_crs(crs=crs, epsg=epsg)
  File "/home/alexander/Github/camels/camelsp/merit_hydro/delineator/venv/lib/python3.9/site-packages/geopandas/geoseries.py", line 1124, in to_crs
    self.values.to_crs(crs=crs, epsg=epsg), index=self.index, name=self.name
  File "/home/alexander/Github/camels/camelsp/merit_hydro/delineator/venv/lib/python3.9/site-packages/geopandas/array.py", line 762, in to_crs
    raise ValueError(
ValueError: Cannot transform naive geometries.  Please set a crs on the object first.

I think this error occurs because there is no .prj file for the merit catchments on Google Drive (linked in your instruction): https://drive.google.com/drive/folders/1owkvZQBMZbvRv3V4Ff3xQPEgmAC48vJo

Your sample data for Iceland includes the file cat_pfaf_27_MERIT_Hydro_v07_Basins_v01.prj which I cannot find for my area.

There are also some incompatibility warnings at the beginning of the output but I do not think that those are causing the problem.

I tried to run python delineate.py in a fresh conda and a fresh venv environemnt, both produce the same error.

Let me know if you need anything else and thank you!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.