Feature description
The development of COBYQA should be equipped with the following infrastructure. They should be done BEFORE making any further changes to the current version of the code.
- Unit tests.
This means the tests for the correctness of each component of the software.
It should be triggered by each push and also periodically (e.g., once every day).
It should contain sufficiently many "tough" tests (strongly ill-conditioned problems, data with NaN/Inf, improper inputs, large-scale problems, etc). It is a joke to test code with only normal data.
The tests should be both randomized and reproducible. The randomization seed should be changed regularly (I would change the seed once every week so that I have a week to debug if the tests detect something wrong).
- Verifications.
This will be needed (not now) if you implement a new version that aims to behave the same as a standard version (e.g., when porting the software to C++, Julia, etc). The verification script should verify that the new and standard versions behave the same on a sufficiently large set (hundreds) of problems.
The same as the unit tests, the verifications should be triggered by each push and also periodically, should contain sufficiently many tough tests, and should be randomized and reproducible with the seed changed regularly.
- Profilings.
This means profiling the code against a previous / standard version, in order to make sure that the changes you make really improve the performance rather than the opposite. Using the artifacts of the GitHub Actions, we can also keep a record of the (recent) evolution of the performance.
It is extremely important to make sure that the comparisons are fair and unbiased because we will make decisions/changes according to the comparisons! We also have to make sure that the difference we observe is truly due to the decisions/changes we make rather than because of rounding errors or pure luck.
To achieve this, we have to ensure that the problem set is sufficiently large, and we have to randomize the tests. Randomization means introducing a reasonable amount of perturbation to the problems and to the data (initial point, initial trust region radius, etc) and then comparing the average performance. The perturbation should not be too high --- a perturbation as small as the machine epsilon would be enough to produce significant changes to the iterative process, while a perturbation too large may lead to "low performance" of all solvers. Note that I am not talking about testing noisy problems, which should also be included in the profiling.
- "Programming by Contract".
I strongly suggest you add preconditions and postconditions to the code.
See, e.g.,
https://github.com/libprima/prima/blob/367a8e7bb3eb4a388f10f72d776c7ad0898bb75e/fortran/newuoa/trustregion.f90#L123
https://github.com/libprima/prima/blob/367a8e7bb3eb4a388f10f72d776c7ad0898bb75e/fortran/newuoa/trustregion.f90#L435
Pre/postconditons overlap with unit tests, but they cannot replace each other. They may seem tedious, and some conditions may even feel silly, but the benefit is enormous. This is essential especially if you want to provide a "reference implementation" for other people/languages. The pre/postcondtions do not only ensure the basic correctness of your code, but also tell others the basic requirements that have to be met if they make a new implementation.
I have implemented everything mention above for the PRIMA project (see the GitHub Actions and the workflows). Programming is not something I am particularly enjoying or good at, but the situation is different for you. Since I can do this, I believe that you can do similar things, but much better than what I have done (or can ever possibly do).
Thanks, and enjoy!
Proposed solution
No response
Considered alternatives
No response