This is a minimal implementation of the Covariance Matrix Adaptation Evolution Strategy optimization algorithm.
Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via mutation and recombination) and selection: in each generation (iteration) new individuals (candidate solutions, denoted as x) are generated by variation, usually in a stochastic way, and then some individuals are selected for the next generation based on their fitness or objective function value f(x).
http://en.wikipedia.org/wiki/CMA-ES
This implementation is based on the original Matlab purecmaes.m:
https://www.lri.fr/~hansen/cmaes_inmatlab.html
purecmaes.m is a minimalistic implementation running in Matlab and Octave. It is mainly provided for educational purpose: reading, understanding and running basic experiments. It contains roughly 100 lines, with 20 lines reflecting the core implementation of the algorithm. This implementation has only rudimentary termination conditions implemented, under some circumstances it might be numerically inefficient and it should probably not be used for running "real" experiments.
However most of core algorithm from cmaes.m is implemented. Features like stopping options, diagonal only covariance matrix and noisy objective function are missing.
cmaes(objFun::Function, pinit::AbstractVector, sigma::AbstractVector; lambda=0,stopeval=0,stopDeltaFitness=1e-12)
objFun
is the function to be minimized that map R^N to Rpinit
is a N-vector containing the initial conditionsigma
is a N-vector or a scalar that defines the initial diagonal covariance matrix
With the optional arguments:
lambda
is the population sizestopeval
is the maximum number of function evaluation, by default stopeval = 1e3*N^2stopDeltaFitness
is the minimum change in the error function before stopping
Test on the Rosenbrock function:
rosenbrock(p) = (1-p[1])^2 + 100*(p[2]-p[1]^2)^2
pmin=cmaes(rosenbrock,-rand(2),ones(2));
6-3 CMA-ES
iter: 25 fcount: 150 fval: 6.77e-01 axis-ratio: 2.90e+00
iter: 50 fcount: 300 fval: 9.58e-02 axis-ratio: 5.90e+00
iter: 75 fcount: 450 fval: 1.45e-02 axis-ratio: 1.14e+01
iter: 100 fcount: 600 fval: 3.69e-08 axis-ratio: 2.22e+01
Correlation matrix:
2x2 Float64 Array:
1.0 0.99713
0.99713 1.0
pmin:
[1.0000052051864698,1.0000108150577227]
fmin:
4.346875072347545e-11
Here the high correlation between p[1] and p[2] indicates that the minimum (1,1) is located in a narrow valley.
Fitting a damped sinus:
x = 8*rand(100);
model(x,p) = p[1]*exp(-p[2]*x).*sin(p[3]*2*pi*x + p[4]);
data = model(x,[2,0.1,0.5,1]) + 0.1*randn(100);
errFun(p) = sum( (data-model(x,p)).^2 )
using ASCIIPlots
scatterplot(x,data)
-------------------------------------------------------------
|^ | 2.04
|^ |
| ^ ^ |
| ^^ |
| ^ ^^ |
| ^^ ^ |
| ^ ^ ^ ^ ^ ^^ |
| ^ ^ ^ ^ ^^|
| ^ ^ ^ ^ |
| ^ ^ ^ ^ ^ |
| ^ ^ ^ |
| ^ ^ ^ ^ ^ ^ ^ |
| ^ ^ ^ ^ ^^ |
| ^ ^ ^^ |
| ^ ^ ^ ^ ^ ^^ |
| ^ ^ ^ ^ ^^ |
| ^ ^^ ^^ ^ |
| ^ ^ ^ |
| ^ |
| ^^ | -1.92
-------------------------------------------------------------
0.20 7.94
pmin = cmaes(errFun,rand(4),ones(4),lambda=16,stopeval=5000)
16-8 CMA-ES
iter: 25 fcount: 400 fval: 8.44e+01 axis-ratio: 5.53e+00
iter: 50 fcount: 800 fval: 3.34e+01 axis-ratio: 9.10e+00
iter: 75 fcount: 1200 fval: 1.30e+00 axis-ratio: 3.37e+01
iter: 100 fcount: 1600 fval: 1.13e+00 axis-ratio: 8.09e+01
iter: 125 fcount: 2000 fval: 1.13e+00 axis-ratio: 8.38e+01
Correlation matrix:
4x4 Float64 Array:
1.0 -0.827892 0.193184 -0.220111
-0.827892 1.0 -0.204811 0.257491
0.193184 -0.204811 1.0 -0.822116
-0.220111 0.257491 -0.822116 1.0
pmin:
[-1.9586661149601514,0.09269418356545647,0.50064486101525,-2.1538011517772677]
fmin:
1.1264664454434268
4-element Float64 Array:
-1.95867
0.0926942
0.500645
-2.1538
scatterplot(x,model(x,pmin))
-------------------------------------------------------------
|^ | 1.89
|^ |
|^^ |
| ^ ^ |
| ^^ |
| ^ ^ ^ ^^^ |
| ^ ^ ^ ^ ^^ |
| ^ ^ ^ ^ ^|
| ^ ^ ^ ^ |
| ^ ^ ^ ^ |
| ^ ^ ^ |
| ^ ^ ^ ^ ^ ^ |
| ^ |
| ^ ^ ^ ^^ |
| ^ ^ ^ ^ |
| ^ ^ ^ ^ ^^ |
| ^ ^ ^^ ^ |
| ^ ^^ |
| ^ ^^ |
| ^^^ | -1.75
-------------------------------------------------------------
0.24 7.92
Here the algorithm found a true minimum that doesn't correspond to the original parameters, due to the relation sin(x) = -sin(x-pi).