provides an opportunity to analyze large-scale baseflow trends under global change π₯
To fill the gaps in time-series baseflow datasets, we introduced a machine learning approach called long short-term memory (LSTM) networks to develop a monthly baseflow dataset.
To better train across basins, we compared the standard LSTM with four variant architectures using additional static properties as input. Results show that three variant architectures (Joint, Front, and EA-LSTM) perform better than the standard LSTM, with median Kling-Gupta efficiency across basins greater than 0.85.
Based on Front LSTM, the monthly baseflow dataset with 0.25Β° spatial resolution across the contiguous United States from 1981 to 2020 was obtained, which can be downloaded from the release page.
βββ configs <- Hydra configuration files
β βββ constant <- Folder paths and constants
β βββ dataset <- Configs of Pytorch dataset
β βββ datasplit <- Split dataset into train and test
β βββ hydra <- Configs of Hydra logging and launcher
β βββ loss <- Configs of loss function
β βββ model <- Configs of Pytorch model architectures
β βββ optimizer <- Configs of optimizer
β βββ trainer <- Configs of validation metrics and trainer
β βββ tuner <- Configs of Optuna hyperparameter search
β βββ config.yaml <- Main project configuration file
β
βββ data <- Baseflow, time series, and static properties
β
βββ logs <- Logs generated by Hydra and PyTorch loggers
β
βββ saved <- Saved evaluation results and model parameters
β
βββ src
β βββ datasets <- PyTorch datasets
β βββ datasplits <- Dataset splitter for train and test
β βββ models <- PyTorch model architectures
β βββ trainer <- Class managing training process
β βββ utils <- Utility scripts for metric logging
β βββ evaluate.py <- Model evaluation piplines
β βββ perpare.py <- Data preparation piplines
β βββ simulate.py <- Simulate gridded baseflow
β
βββ run.py <- Run pipeline with chosen configuration
β
βββ main.py <- Main process for the whole project
β
βββ .gitignore <- List of files/folders ignored by git
βββ requirements.txt <- File for installing python dependencies
βββ LICENSE
βββ README.md
from src import prepare
# download data from ERA5 and Google Earth Engine
prepare(cfg['constant'])
# detailed settings are in optuna.yaml
python run.py -m tuner=optuna
# evaluate Front LSTM using test_size=0.2
python run.py -m model=front dataset.eco=CPL, NAP, NPL
# train Front LSTM using test_size=0
python run.py -m model=front datasplit=full dataset.eco=CPL, NAP, NPL
from src import simulate
# load the trained model for each ecoregion
checkpoint = 'saved/train/front/CPL/models/model_latest.pth'
simulate(checkpoint)
- Xie, J., Liu, X., Tian, W., Wang, K., Bai, P., & Liu, C. (2022). Estimating Gridded Monthly Baseflow From 1981 to 2020 for the Contiguous US Using Long Short-Term Memory (LSTM) Networks. Water Resources Research, 58(8), e2021WR031663. https://doi.org/10.1029/2021WR031663