Giter Club home page Giter Club logo

rrmpg's Introduction

Hi there ๐Ÿ‘‹

I'm Frederik Kratzert, Researcher @ Google working in the Flood Forecasting Team. This here is my private GitHub account, were I maintain my open source projects and publish code related to any kind of research articles.

๐Ÿค“ Research ๐Ÿค”

Most of my research is dedicated towards solving applications in environmental sciences (mainly hydrology) with machine learning.

Links

rrmpg's People

Contributors

certik avatar danklotz avatar gauchm avatar kratzert avatar visr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rrmpg's Issues

Possible bug in Fortran

Great to see this project using Numba and very pleased to see Numba performing so well :)

I think in this https://github.com/kratzert/RRMPG/blob/master/examples/speed_comparision.ipynb
Cell [4], the %%fortran magics declaration has a possible bug.

I'm not sure the Python loop in Cell [3]:

    for i in range(rain.size):
        state_out = (1 - c) * state_in + a * rain[i]
        outflow[i] = (1 - a - b) * rain[i] + c * state_in
        state_in = state_out

is quite the same as the Fortran implementation (difference annotated):

        state_in = 0
        state_out = 0
        do t = 1,col_dim
            state_out = (1 - c) * state_in + a * inflow(t) !! this assignment is not used and will be dead-code eliminated
            outflow(t) = (1 - a - b) * inflow(t) + c * state_in !! due to state_in always being 0, `c * state_in` is eliminated, (1-a-b) will be hoisted as a loop invariant constant
            state_out = state_in !! again, unused, so DCE.
        end do

which I think ends up looking to the compiler like:

        do t = 1,col_dim
            outflow(t) = (constant) * inflow(t)
        end do

and if one looks at the assembler (gfortran -O3 -S):

.L3:
        movsd   (%r8), %xmm0
        cmpl    $1, %edx
        movl    $2, %edi
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, (%r9)
        jbe     .L5
        movsd   8(%r8), %xmm0
        cmpl    $2, %edx
        movb    $3, %dil
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, 8(%r9)
        jbe     .L5
        movsd   16(%r8), %xmm0
        cmpl    $3, %edx
        movb    $4, %dil
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, 16(%r9)
        jbe     .L5
        movsd   24(%r8), %xmm0
        movb    $5, %dil
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, 24(%r9)
.L5:
        cmpl    %edx, %eax
        je      .L1

the repeated block:

        movsd   (%r8), %xmm0
        cmpl    $1, %edx
        movl    $2, %edi
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, (%r9)
        jbe     .L5

is essentially outflow(t) = (1 - a - b ) * inflow(t), loop unwound with %r8+offset as inflow and %r9+offset as outflow.

If you also think this is a bug, would be interesting to see how Numba performs against equivalent code.

Calibration

Hi again, and thanks for fixing the area-related bug.
I have run the model for some catchments from the CAMELS dataset in order to have an idea of the capability of the model.
In all cases results are not good. I report Nash Sutcliffe Efficiencies to give you an idea:

Although this is ok for educational purposes, it would also be nice to be able to show a case with good fitting.
I guessed the problem might be the calibration method used, which finds an optimal parameter set which corresponds to a local minima. However, this should not be the case, since scipy.optimize.differential_evolution "finds the global minimum of a multivariate function".

Implementation/use of a more advanced calibration method (e.g., the Shuffled Complex Evolution Algorithm) or the integration of available python libraries (e.g., spotpy) might solve the problem (again "might", I did not test them beforehand). But it might also not be necessary.
I am currently trying to fix parameters bounds first, to check if it helps.
Thanks again, Stefano

Citation and/or acknowledgement

I've used your implementation of GR4J as a template for a Jax-ified version that I've been using for some applications of MCMC for change detection. Here's the repo, for reference.

Do you have a paper or citable reference I can point to? For now I plan to mention the RRMPG repository in my manuscript.

Adapt all metric functions to calculate metrics for multiple simulation at once

With the last commit 27e420d all simulation functions were parallelized. We now have the output of the simulation function qsim being optionally a 2D-array, were the second dimension (first for Pythonista) holds the results for different parameter sets. It would now be good, if we could pass this array directly to all the the metric functions in rrmpg.utils.metrics, which at the moment only works with both inputs being same sized arrays. The problem is mainly the validate_array_input() function, which flattens the array, which is favorable in many cases, but not in the case of 2D simulation arrays with multiple simulations.

I'm still unsure how to resolve this. Options are:

  • handle the input validation checks for the simulation arrays in all metric functions separate.
  • extend/adapt the validate_array_input() function to have n option for not flattening the array
  • write a second array validation function for 2D arrays without flattening. Maybe the least favorable, since it would basically double main parts of the code.

At the moment I tend to the 2nd option, with an optional input argument which defaults to not flattening.

get_random_params() should be adapted to return n parameter sets on request

It might be handy for further changes to adapt the random parameter generation functions to have an additional argument, that let's the user specify how many parameter sets should be generated at once.

Therefore changes in the set_params() function might be necessary since the get_random_params will return a dictionary of numpy arrays instead of a dictionary of numbers.

Add validation run for HBV model to test suite

I need someone with MATLAB installed to run the original code for me with fixed parameters to generate a simulation timeseries to compare against. Inputs can be used from the provided data, shipping with the MATLAB code.

Parallelize simulation function of models to process multiple parameter sets in parallel

The parallelization feature of the numba library yields a great potential for further speed improvements, whenever multiple parameter for a model needs to be evaluated. This can be done in parallel, since each model run with one parameter set is independent of another parameter set. First quick test show huge speed improvements that come to play e.g. for the Monte Carlo implementation or any further optimization scheme.

Here I have published an article explaining some of the basics of numbas parallelization features. The important two things are the parallel=True flag for the decorator, as well as the prange function, used to iterate of the parameter sets.

This might need adaption in the _loss() function of each model, since they are expecting a 1-D array as output of the simulation function, but will now return a 2-D array.

Numba warning

Hello, Numba is raising a deprecation warning on the extrapolate_temperature function in rrmpg/models/cemaneige_utils.py due to a future change in functionality "NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.". Can this be addressed?
Thanks, James

HBVEdu model: missing attribute area?

Hi,
I am trying to set up an HBVEdu model by using RRMPG.
I follow step by step the procedure illustrated here (for another model). However, when I get to the step of fitting, I get the following error:

result = model.fit(cal['QObs(mm/d)'], cal['tmean(C)'], cal['prcp(mm/day)'],
cal.index.month.values, long_mean['PET'], long_mean['tmean(C)'])


AttributeError Traceback (most recent call last)
in
1 result = model.fit(cal['QObs(mm/d)'], cal['tmean(C)'], cal['prcp(mm/day)'],
----> 2 cal.index.month.values, long_mean['PET'], long_mean['tmean(C)'])

C:\Anaconda3\envs\rrmpg\lib\site-packages\rrmpg-0.1.1-py3.6.egg\rrmpg\models\hbvedu.py in fit(self, qobs, temp, prec, month, PE_m, T_m, snow_init, soil_init, s1_init, s2_init)
284 # pack input arguments for scipy optimizer
285 args = (qobs, temp, prec, month, PE_m, T_m, snow_init, soil_init,
--> 286 s1_init, s2_init, self._dtype, self.area)
287 bnds = tuple([self._default_bounds[p] for p in self._param_list])
288

AttributeError: 'HBVEdu' object has no attribute 'area'

So, attribute area is missing. My problem is that I cannot understand how I should pass "area" to the model. I tried several things, such as:

area = 100 or
model = HBVEdu(area=100)
-> no effect

model.set_params(area, 100) or
model.set_params({'area':100})
-> not working, area is not a parameter (it is not in the list of parameters)

result = model.fit(cal['QObs(mm/d)'], cal['tmean(C)'], cal['prcp(mm/day)'],
cal.index.month.values, long_mean['PET'], long_mean['tmean(C)'],
snow_init=0.0, soil_init=0.0, s1_init=0.0, s2_init=0.0, area=100)
-> also not working

From the error, I see that the code is searching for self.area, so I believe this should be provided as an attribute of model. But I cannot understand how.
I am not proficient in python, and that is maybe the issue.
I thank you in advance if you can help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.