immm-sfa / mosartwmpy Goto Github PK

Python translation of MOSART-WM: a water routing and management model

Home Page: https://mosartwmpy.readthedocs.io/

License: Other

Python 95.68% Shell 0.04% TeX 4.16% Dockerfile 0.13%

mosartwmpy's Introduction

mosartwmpy

mosartwmpy is a python translation of MOSART-WM, a model for water routing and reservoir management written in Fortran. The original code can be found at IWMM and E3SM, in which MOSART is the river routing component of a larger suite of earth-science models. The motivation for rewriting is largely for developer convenience -- running, debugging, and adding new capabilities were becoming increasingly difficult due to the complexity of the codebase and lack of familiarity with Fortran. This version aims to be intuitive, lightweight, and well documented, while still being highly interoperable. For a quick start, check out the Jupyter notebook tutorial!

getting started

Ensure you have Python >= 3.8 available (consider using a virtual environment, see the docs here for a brief tutorial), then install mosartwmpy with:

pip install mosartwmpy

Alternatively, install via conda with:

conda install -c conda-forge mosartwmpy

Download a sample input dataset spanning May 1981 by running the following and selecting option 1 for "tutorial". This will download and unpack the inputs to your current directory. Optionally specify a path to download and extract to instead of the current directory.

python -m mosartwmpy.download

Settings are defined by the merger of the mosartwmpy/config_defaults.yaml and a user specified file which can override any of the default settings. Create a config.yaml file that defines your simulation (if you chose an alternate download directory in the step above, you will need to update the paths to point at your data):

config.yaml

simulation:
  name: tutorial
  start_date: 1981-05-24
  end_date: 1981-05-26

grid:
  path: ./input/domains/mosart_conus_nldas_grid.nc

runoff:
  read_from_file: true
  path: ./input/runoff/runoff_1981_05.nc

water_management:
  enabled: true
  demand:
    read_from_file: true
    path: ./input/demand/demand_1981_05.nc
  reservoirs:
    enable_istarf: true
    parameters:
      path: ./input/reservoirs/reservoirs.nc
    dependencies:
      path: ./input/reservoirs/dependency_database.parquet
    streamflow:
      path: ./input/reservoirs/mean_monthly_reservoir_flow.parquet
    demand:
      path: ./input/reservoirs/mean_monthly_reservoir_demand.parquet

mosartwmpy implements the Basic Model Interface defined by the CSDMS, so driving it should be familiar to those accustomed to the BMI. To launch the simulation, open a python shell and run the following:

from mosartwmpy import Model

# path to the configuration yaml file
config_file = 'config.yaml'

# initialize the model
mosart_wm = Model()
mosart_wm.initialize(config_file)

# advance the model one timestep
mosart_wm.update()

# advance until the `simulation.end_date` specified in config.yaml
mosart_wm.update_until(mosart_wm.get_end_time())

model input

Input for mosartwmpy consists of many files defining the characteristics of the discrete grid, the river network, surface and subsurface runoff, water demand, and dams/reservoirs. Currently, the gridded data is expected to be provided at the same spatial resolution. Runoff input can be provided at any time resolution; each timestep will select the runoff at the closest time in the past. Currently, demand input is read monthly but will also pad to the closest time in the past. Efforts are under way for more robust demand handling.

Dams/reservoirs require four different input files: the physical characteristics, the average monthly flow expected during the simulation period, the average monthly demand expected during the simulation period, and a database mapping each GRanD ID to grid cell IDs allowed to extract water from it. These dam/reservoir input files can be generated from raw GRanD data, raw elevation data, and raw ISTARF data using the provided utility. The best way to understand the expected format of the input files is to examine the sample inputs provided by the download utility: python -m mosartwmpy.download.

multi-file input

To use multi-file demand or runoff input, use year/month/day placeholders in the file path options like so:

If your files look like runoff-1999.nc, use runoff-{Y}.nc as the path
If your files look like runoff-1999-02.nc, use runoff-{Y}-{M}.nc as the path
If your files look like runoff-1999-02-03, use runoff-{Y}-{M}-{D}.nc as the path, but be sure to provide files for leap days as well!

model output

By default, key model variables are output on a monthly basis at a daily averaged resolution to ./output/<simulation name>/<simulation name>_<year>_<month>.nc. See the configuration file for examples of how to modify the outputs, and the ./mosartwmpy/state/state.py file for state variable names.

Alternatively, certain model outputs deemed most important can be accessed using the BMI interface methods. For example:

from mosartwmpy import Model

mosart_wm = Model()
mosart_wm.initialize()

# get a list of model output variables
mosart_wm.get_output_var_names()

# get the flattened numpy.ndarray of values for an output variable
supply = mosart_wm.get_value_ptr('supply_water_amount')

subdomains

To simulate only a subset of basins (defined here as a collection of grid cells that share the same outlet cell), use the configuration option grid -> subdomain (see example below) and provide a list of latitude/longitude coordinate pairs representing each basin of interest (any single coordinate pair within the basin). For example, to simulate only the Columbia River basin and the Lake Washington regions, one could enter the coordinates for Portland and Seattle:

config.yaml

grid:
  subdomain:
    - 47.6062,-122.3321
    - 45.5152,-122.6784
  unmask_output: true

By default, the output files will still store empty NaN-like values for grid cells outside the subdomain, but for even faster simulations and smaller output files set the grid -> unmask_output option to false. Disabling this option causes the output files to only store values for grid cells within the subdomain. These smaller files will likely take extra processing to effectively interoperate with other models.

visualization

Model instances can plot the current value of certain input and output variables (those available from Model.get_output_var_name and Model.get_input_var_names):

from mosartwmpy import Model
config_file = 'config.yaml'
mosart_wm = Model()
mosart_wm.initialize(config_file)
for _ in range(8):
    mosart_wm.update()

mosart_wm.plot_variable('outgoing_water_volume_transport_along_river_channel', log_scale=True)

Using provided utility functions, the output of a simulation can be plotted as well.

Plot the storage, inflow, and outflow of a particular GRanD dam:

from mosartwmpy import Model
from mosartwmpy.plotting.plot import plot_reservoir
config_file = 'config.yaml'
mosart_wm = Model()
mosart_wm.initialize(config_file)
mosart_wm.update_until()

plot_reservoir(
    model=mosart_wm,
    grand_id=310,
    start='1981-05-01',
    end='1981-05-31',
)

Plot a particular output variable (as defined in config.yaml) over time:

from mosartwmpy import Model
from mosartwmpy.plotting.plot import plot_variable
config_file = 'config.yaml'
mosart_wm = Model()
mosart_wm.initialize(config_file)
mosart_wm.update_until()

plot_variable(
    model=mosart_wm,
    variable='RIVER_DISCHARGE_OVER_LAND_LIQ',
    start='1981-05-01',
    end='1981-05-31',
    log_scale=True,
    cmap='winter_r',
)

If cartopy, scipy, and geoviews are installed, tiles can be displayed along with the plot:

plot_variable(
    model=mosart_wm,
    variable='RIVER_DISCHARGE_OVER_LAND_LIQ',
    start='1981-05-01',
    end='1981-05-31',
    log_scale=True,
    cmap='winter_r',
    tiles='StamenWatercolor'
)

model coupling

A common use case for mosartwmpy is to run coupled with output from the Community Land Model (CLM). To see an example of how to drive mosartwmpy with runoff from a coupled model, check out the Jupyter notebook tutorial!

testing and validation

Before running the tests or validation, make sure to download the "sample_input" and "validation" datasets using the download utility python -m mosartwmpy.download.

To execute the tests, run ./test.sh or python -m unittest discover mosartwmpy/tests from the repository root.

To execute the validation, run a model simulation that includes the years 1981 - 1982, note your output directory, and then run python -m mosartwmpy.validate from the repository root. This will ask you for the simulation output directory, think for a moment, and then open a figure with several plots representing the NMAE (Normalized Mean Absolute Error) as a percentage and the spatial sums of several key variables compared between your simulation and the validation scenario. Use these plots to assist you in determining if the changes you have made to the code have caused unintended deviation from the validation scenario. The NMAE should be 0% across time if you have caused no deviations. A non-zero NMAE indicates numerical difference between your simulation and the validation scenario. This might be caused by changes you have made to the code, or alternatively by running a simulation with different configuration or parameters (i.e. larger timestep, fewer iterations, etc). The plots of the spatial sums can assist you in determining what changed and the overall magnitude of the changes.

If you wish to merge code changes that intentionally cause significant deviation from the validation scenario, please work with the maintainers to create a new validation dataset.

mosartwmpy's People

Contributors

Stargazers

Watchers

Forkers

crvernon yunamao hishameldardiry lidh966

mosartwmpy's Issues

Utilize `pkg_resources` instead of relative paths

Consider moving data utilized as a part of your package to a data directory; where <packagename>/data

Then call the data where needed in your code using pkg_resources; see overview for rationale (https://setuptools.readthedocs.io/en/latest/pkg_resources.html).

All relative paths should be replaced with pkg_resources calls.

This will work like the following, an example from your config.py file:

import pkg_resources

from benedict import benedict
from benedict.dicts import benedict as Benedict

def get_config(config_file_path: str) -> Benedict:
    """Configuration object for the model, using the Benedict type.
    
    Args:
        config_file_path (string): path to the user defined configuration yaml file
    
    Returns:
        Benedict: A Benedict instance containing the merged configuration
    """

    default_config_file = pkg_resources.resource_filename('mosartwmpy', 'data/config_defaults.yaml')
    config = benedict(default_config_file, format='yaml')
    if config_file_path and config_file_path != '':
        config.merge(benedict(config_file_path, format='yaml'), overwrite=True)
    
    return config

NOTE: include your data in a MANIFEST.in file (see https://packaging.python.org/guides/using-manifest-in/) and ensure that you include include_package_data=True in your setup function in setup.py (see https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html)

Error when running with the the TBB threading layer

Reported by @erexer: When running mosartwmpy with numba using the TBB threading layer, an error occurs when trying to read the TypedDict that represents the reservoir/grid cell dependency database. Need to either force workqueue as the threading layer or figure out what's causing TBB layer to fail. It might be that the code is expecting the threads to share memory, but the TBB threads do not share memory.

Allow mid-month writing of NetCDF

Specs:

Python 3.9 set to install virtual env based on package reqs
MacOSX HighSierra 10.13.6

Expected:
Allow mid-month start_date writing to NetCDF file

Current:
Under dates:

  start_date: 1981-05-24
  end_date: 1981-05-26

I get the following error:

/Users/d3y010/repos/github/mosartwmpy/venv/bin/python /Users/d3y010/repos/github/mosartwmpy/example.py
Traceback (most recent call last):
  File "/Users/d3y010/repos/github/mosartwmpy/venv/lib/python3.9/site-packages/xarray/backends/file_manager.py", line 199, in _acquire_with_cache_info
    file = self._cache[self._key]
  File "/Users/d3y010/repos/github/mosartwmpy/venv/lib/python3.9/site-packages/xarray/backends/lru_cache.py", line 53, in __getitem__
    value = self._cache[key]
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/Users/d3y010/repos/github/mosartwmpy/output/unit_tests/unit_tests_1981_05.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/d3y010/repos/github/mosartwmpy/example.py", line 14, in <module>
    mosart_wm.update_until(datetime.combine(datetime(1981, 5, 26), time.max).timestamp())
  File "/Users/d3y010/repos/github/mosartwmpy/mosartwmpy/model.py", line 205, in update_until
    self.update()
  File "/Users/d3y010/repos/github/mosartwmpy/mosartwmpy/model.py", line 195, in update
    raise e
  File "/Users/d3y010/repos/github/mosartwmpy/mosartwmpy/model.py", line 192, in update
    update_output(self)
  File "/Users/d3y010/repos/github/mosartwmpy/mosartwmpy/output/output.py", line 37, in update_output
    write_output(self)
  File "/Users/d3y010/repos/github/mosartwmpy/mosartwmpy/output/output.py", line 105, in write_output
    nc = open_dataset(filename).load()
  File "/Users/d3y010/repos/github/mosartwmpy/venv/lib/python3.9/site-packages/xarray/backends/api.py", line 508, in open_dataset
    store = backends.NetCDF4DataStore.open(
  File "/Users/d3y010/repos/github/mosartwmpy/venv/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 358, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "/Users/d3y010/repos/github/mosartwmpy/venv/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 314, in __init__
    self.format = self.ds.data_model
  File "/Users/d3y010/repos/github/mosartwmpy/venv/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 367, in ds
    return self._acquire()
  File "/Users/d3y010/repos/github/mosartwmpy/venv/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 361, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/Users/d3y010/repos/github/mosartwmpy/venv/lib/python3.9/site-packages/xarray/backends/file_manager.py", line 187, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
  File "/Users/d3y010/repos/github/mosartwmpy/venv/lib/python3.9/site-packages/xarray/backends/file_manager.py", line 205, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "netCDF4/_netCDF4.pyx", line 2358, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4/_netCDF4.pyx", line 1926, in netCDF4._netCDF4._ensure_nc_success
FileNotFoundError: [Errno 2] No such file or directory: b'/Users/d3y010/repos/github/mosartwmpy/output/unit_tests/unit_tests_1981_05.nc'
Process finished with exit code 1

initialize - better message for when no config file provided, and use defaults

Better CI/CD testing pipeline

Consider alternative pipelines for Github automated build and testing workflows, for instance using mamba as in https://github.com/cheginit/pynhd/blob/main/.github/workflows/test.yml

data download utility - allow user to specify path, and update walkthrough to mention path to data

also add progress indicator

create a docker container for mosartwmpy

this will aid in launching jupyterhub clusters for the csdms clinic

[JOSS review] Command-line Interface

Considering that users in most cases just need to modify the model via a config file, having a CLI based on, for example, click can make using the package much easier.

EDIT: Reference to openjournals/joss-reviews#3221

Implement demand and returnflow enchancements

For my reference:
https://immm-sfa.atlassian.net/wiki/spaces/IP/pages/2259943425/MOSART-WM+demand+and+return+flow+enhancements

TODO translate the gist of the above private document into this issue

Add correct license and disclaimer

Please add the following LICENSE and DISCLAIMER files to the root dir and change the current license to BSD2-Simplified:

LICENSE:

mosartwmpy

Copyright (c) 2021, Battelle Memorial Institute

Open source under license BSD 2-Clause

1.	Battelle Memorial Institute (hereinafter Battelle) hereby grants permission to any person or entity lawfully obtaining a copy of this software and associated documentation files (hereinafter “the Software”) to redistribute and use the Software in source and binary forms, with or without modification.  Such person or entity may use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and may permit others to do so, subject to the following conditions:
•	Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimers.
•	Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
•	Other than as used herein, neither the name Battelle Memorial Institute or Battelle may be used in any form whatsoever without the express written consent of Battelle.

2.	THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL BATTELLE OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

DISCLAIMER:

mosartwmpy

Copyright (c) 2021, Battelle Memorial Institute

Open source under license BSD 2-Clause

Open Source Disclaimer:

This material was prepared as an account of work sponsored by an agency of the United States Government.  Neither the United States Government nor the United States Department of Energy, nor Battelle, nor any of their employees, nor any jurisdiction or organization that has cooperated in the development of these materials, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness or any information, apparatus, product, software, or process disclosed, or represents that its use would not infringe privately owned rights.
Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or Battelle Memorial Institute. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

PACIFIC NORTHWEST NATIONAL LABORATORY
operated by
BATTELLE
for the
UNITED STATES DEPARTMENT OF ENERGY
under Contract DE-AC05-76RL01830

output files - add coordinate reference system

i.e. WGS84 / EPSG 4326

JOSS-REVIEW: tests

Dear @thurber,

great that you included tests in your package.
I think they are covering the main functions of the BMI model, but I still think one addition would be good: check whether the set_value(), get_value() functions yield the correct results. In my opinion, these functions are the main BMI functions in addition to initialize() and update(). While the latter two are covered, testing the first two should be included!

Thanks!

add support for non-standard calendars in input files

Currently, using a NetCDF file with a non-standard calendar as a runoff or demand input to mosartwmpy will result in an error complaining about NoT BEiNg aBLe tO iNdeX a cFTiMe WitH A dATetIME.

When reading input, we should check the type of the time index and convert to DatetimeIndex if needed, i.e.:

df.indexes['time'] = df.indexes['time'].to_datetimeindex()

support multi-file runoff input

Need catch for `start_date` > `end_date`

Need catch for start_date > end_date from config.

JOSS-REVIEW: documentation

hi @thurber,

i find the documentation overall good, but also pretty concise.
it would be great to include a jupyter notebook with an example application of your model.
this would not only give users an idea how a typical model workflow would look like, but also provide you with an opportunity to explain the model's functionality and set-up step-by-step and in greater detail.

i furthermore would like to see a 'statement of need' (similar as in the paper) in the documentation. not everyone is going to read the paper, but many will just go through the readme or rtd.

reference to: openjournals/joss-reviews#3221.

Formalize reservoir placement and dependency strategy

The general algorithm is:

Dam/Reservoir Placement

Begin with the GRanD reservoirs v1.3
Filter down to the subset of dams that appear in the ISTARF dataset and appear within the mosartwmpy domain
Initially place the dam/reservoir within the mosartwmpy grid cell corresponding to the GRanD dam location lon/lat
If multiple dams appear in one grid cell, check if any appear within a threshold (85%) of the cell border and would be located differently by placing them according to the GRanD reservoir centroid -- if so, move them
Remove dams that occur in the same grid cell by preferring to keep: largest capcity, then largest drainage area, then largest GRanD ID

Dam/Reservoir Dependencies

Beginning at the grid cell containing the dam/reservoir, follow the river network downstream and add as dependencies any grid cell that:
- is within a specified radius of the river (200km)
- shares the same outlet grid cell as the dam/reservoir grid cell
- is lower in elevation than the dam/reservoir grid cell

Elevation

Elevation is determined by upscaling the HYDROSHEDS 30s DEM elevation data to the mosartwmpy grid resolution (1/8 degree), using the average elevation of all HYDROSHED cells contained within the mosartwmpy cell

default land fraction to 1

per Nathalie, most users won't care about this

support restart file

output - include file name in log messages

clarify start date, end date, and timestamp meaning

i.e. mention time zone assumptions, expectations for input files, etc

Experiment: study the stability and sensitivity of timestep size

i.e. can one set the timestep to a whole day or a whole month and expect meaningful results?

better error messages for missing data

currently if you try running a simulation for a time period in which you are missing input data, wmpy will spit out a huge illegible stack trace that just means it can't find the data... need to catch these errors and give friendly messages

[JOSS review] Python version and Dependecies

I am wondering what's reason behind not supporting Python 3.7+. What specific features of Python 3.9 is being used in the code? Pinning the dependencies can limit the usage of the project with other packages. Although one can create several environments, having such unnecessary pinned dependencies might affects the user experience.

The requirements.txt includes unnecessary dependencies:

recommonmark
setuptools
sphinx
sphinx-rtd-theme

Some other suggestions:

I suggest to use conda instead of pyenv, since you're relying on scientific Python libraries, they tend to be orders of magnitudes faster when installed via conda-forge.
You don't think you need both pyarrow and fastparquet since they both do the same thing. pyarrow is the default parquet engine in pandas.
When using h5netcdf instead of netcdf4 you need to be aware of, and inform the user about, the difference that are documented on xarray's website:

There may be minor differences in the Dataset object returned when reading a NetCDF file with different engines. For example, single-valued attributes are returned as scalars by the default engine=netcdf4, but as arrays of size (1,) when reading with engine=h5netcdf.

Overall, I think the dependencies need a reevaluation.

EDIT: Reference to openjournals/joss-reviews#3221

mosartwmpy tries to use too many threads on Deception

Deception has 64 cores on a node, but for some reason if mosartwmpy tries to use more than 32 threads it will segmentation fault

Enhance the BIL to Parquet elevation utility to accept and stitch together multiple files

Since the HYDROSHEDS North American file does not quite cover the entire CONUS domain, we need to update the BIL to Parquet utility to enable it to process and combine multiple BIL files into a single reprojected and resampled output, such that it can combine the HYDROSHEDS NA and CA files.

Additionally, a unit test should be added.

remove `h5netcdf` dependency

mosartwmpy runs slowly on the KNL chipset on NERSC

need to investigate what's causing this and if there are workarounds, since the KNLs are more numerous

make loading demand from multiple files actually work

and in general, be flexible with single/multi file with/without time axis

conda-forge version is not updating

this might be because of the bioconda dependency, epiweeks

JOSS-REVIEW: last minor issues

dear @thurber,

i have managed to install the package with conda now, and it works. great!

a few minor issues should be solved before i will give my green light for publication:

see my comment in the solved issue #49;
another one I have placed in #47;
please include ipython to the dependency list. this way, users can explore the model's functions more interactively;

not required for JOSS, but always great if available:

make the notebook interactive with binder (https://mybinder.org/)

let me know if at least the top three are implemented.
looking forward to it!

[JOSS review] Hard-coded output directory

I think hard-coding the output directory can cause unintended issues since the outputs are usually large files. The user might prefer to save the outputs to specific directories. Either you should explicitly mention that that the output directory is hard-coded and cannot be changed or make it configurable (my personal preference).

The README file says:

By default, key model variables are output on a monthly basis at a daily averaged resolution to ./output/<simulation name>/<simulation name>_<year>_<month>.nc. See the configuration file for examples of how to modify the outputs, and the ./mosartwmpy/state/state.py file for state variable names.

This doesn't suggest that the output directory is hard-coded and suggests that it can be changed via the config file.

EDIT: Reference to openjournals/joss-reviews#3221

Implement farmer crop ABM into mosartwmpy

Transfer Jim's Agent Based Model of farmer crop rotation into mosartwmpy (it is currently only implemented in iwmm)

Add docstrings to all classes and methods

output - allow specifying path in config

Currently, output always writes to ./output/<simulation name>

JOSS-REVIEW: paper

Dear @thurber,

I have started reviewing the JOSS paper.

Here a few comments that should be included in a revised version:

In the first paragraph of the 'Statement of Need' replace 'tightly-coupled codebase' with 'code' or something that is easier to understand for a less expert audience.
While it's in there, I recommend to more clearly specify in 'Statement of Need' what the problem is your software aims to solve and what the intended audience is.
It is not clear why you include BMI functionality. Any particular use case for it?
Talking about BMI, I feel that you work is very close to some work of mine in which I added a BMI to various hydrological models and put them into a modelling framework (https://doi.org/10.5194/nhess-19-1723-2019). This would be a good addition to the BMI section of the paper and would fill in some content for the section 'State of the field' which is not yet really apparent in the paper.
As such, please add a 'State of the field' section or at least a section that covers the relevant content.
It is not clear whether the 'old' FORTRAN version is now replaced with the Python version or, if not, they are updated when model code changes in one of the versions. Please explain (briefly) how this refactoring of the model affects long-term model development.
The section 'Functionalities and limitations' is sligthly too technical for a JOSS paper (in my opinion). Please remove or rewrite it such that it appeals to a non-expert audience.

Besides these points, well-written paper and also references are fine.

Thanks for addressing these points.

remove runoff enabled/disabled flag

no point in simulating empty basins

Implement improved reservoir strategy utilizing starfit

finish implementing the BMI methods

i.e. for getting/setting the inputs/outputs

also update the code to not read input from file if it is specified by the driver (i.e. when running coupled)

[JOSS review] Logging and verbosity

Since many Python users use Jupyter as their IDE, the logging info should be redirected to stdout so the logs don't show up with red background. You can do so by configuring the logger, for example, as follows:

import logging
import sys

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(logging.Formatter(""))
logger.handlers = [handler]
logger.propagate = False

Also, the model is unnecessarily verbose by default. It might not matter much when you run the code via command-line but when you run it via Jupyter it causes issues. A much better approach is to use tqdm. I think a quick fix can be reducing the verbosity by default.

EDIT: Reference to openjournals/joss-reviews#3221

create helper utilities for generating mean flow and demand files from previous runs

Folks may want this rather than having to generate the entire reservoir parameters from create_grand_parameters.py.

The task would be to read in result data from a completed simulation, calculate the monthly average flow and demand at reservoir locations, and save to parquet format, ideally with appropriate metadata regarding the timeframe and inputs used in the simulation.

create helper utility for converting HYDROSHED elevation data to parquet format

The hydroshed elevation data is available here -- for CONUS applications we can use the 30sec void-filled elevation BIL files for North America and Central America.

Since the utility script create_grand_parameters.py expects this elevation data to already be converted to parquet, we should provide a script that performs this conversion.

install fails on Constance

with pip install:

first problem is that numba tries to install before numpy
second problem is that llvmlite fails to install

with conda install:

version 0.2.0 isn't available on conda-forge for some reason
various problems

Mint subset of input and validation data into data repository, and provide manifest/utilities for downloading

Implement running average for the long term mean monthly flow and demand

This would be an enhancement to the current workflow that uses a constant monthly mean.

submit `mosartwmpy` to conda-forge

JOSS-REVIEW: update installation instructions

Dear @thurber,
I will review this model for JOSS and look very much forward to it.
I have cloned the source code and am now trying to install it on my local machine.
Doing so, I find that the installation instructions are too short - currently, it is only referred to a virtual env if you have Python >=3.9. But how to install if if I have a lower version number?
Please add a more elaborated yet concise installation instruction to your README + docs. Once I know how to install the package (and it actually works), I will continue the review process.
Thanks!

Error running tutorial notebook

I installed the latest version from Github and ran the tutorial notebook. But after running the following line:

mosart_wm.plot_variable('surface_water_amount', log_scale=True)

I got the following error:

ValueError                                Traceback (most recent call last)
c:\mosartw\mosartwmpy\notebooks\tutorial.ipynb Cell 10 in <cell line: 1>()
----> [1](vscode-notebook-cell:/c%3A/mosartw/mosartwmpy/notebooks/tutorial.ipynb#ch0000009?line=0) mosart_wm.plot_variable('surface_water_amount', log_scale=True)

File c:\Users\Hamed Khorasani\mambaforge\envs\mosartwm\lib\site-packages\mosartwmpy\model.py:269, in Model.plot_variable(self, variable, log_scale)
    263 def plot_variable(
    264         self,
    265         variable: str,
    266         log_scale: bool = False,
    267 ):
    268     """Display a colormap of a spatial variable at the current timestep."""
--> 269     data = self.unmask(self.get_value_ptr(variable)).reshape(self.get_grid_shape())
    270     if log_scale:
    271         data = np.where(data > 0, data, np.nan)

File c:\Users\Hamed Khorasani\mambaforge\envs\mosartwm\lib\site-packages\mosartwmpy\model.py:294, in Model.unmask(self, vector)
    292 elif vector.dtype == bool:
    293     unmasked[:] = False
--> 294 unmasked[self.mask] = vector
    295 return unmasked

ValueError: NumPy boolean array indexing assignment cannot assign 103936 input values to the 80053 output values where the mask is true

I was wondering if you could kindly provide some information about what causes this error and what changes should I make.

Investigate better parallel processing schemes

Current parallel implementation only scales efficiently to ~8 CPUs due to reliance on numexpr vector math. May need to implement true MPI via mpi4py or otherwise convert vector based math to scalar cell-by-cell math.

[JOSS review] Add documentation examples for using restart files and driving with dynamic runoff input

FYI openjournals/joss-reviews#3221

immm-sfa / mosartwmpy Goto Github PK

mosartwmpy's Introduction

mosartwmpy

getting started

model input

multi-file input

model output

subdomains

visualization

model coupling

testing and validation

mosartwmpy's People

Contributors

Stargazers

Watchers

Forkers

mosartwmpy's Issues

Recommend Projects

Recommend Topics

Recommend Org