ciren / cilib Goto Github PK

View Code? Open in Web Editor NEW

123.0 123.0 102.0 15.65 MB

Typesafe, purely functional Computational Intelligence

Home Page: https://cilib.net

License: Apache License 2.0

Scala 95.59% Nix 0.46% JavaScript 2.93% CSS 1.02%

cilib computational-intelligence functional-programming machine-learning optimization-algorithms scala

cilib's People

Contributors

Stargazers

Watchers

Forkers

jhfischer westerma filinep benniel gpampara waynejrobert ksgyoung heatzync apengelbrecht censlin anna-earwen sayamz vervas jcerda dortizarroyo robgarden afromee wm-young cicerog carlosluc rodrigocneto diegosiqueir4 ericmoraess lacerdamarcelo gustavorpe mannym2124 twd1988 arielazo aguirrediego melycontreras persica seekmas greeffstes togaurav faithagwang obothlale kola0o0 sighmin polynomial4 fusionhs loops13 wendersnawen undeadpebble dagney phlippieb edwanaba samido lfpessoa deepaurorasky jonasmwaura cim-scp cscheepers perps kharrison28 dibblesvalhalla jm10lh collinprice kazkibergetic jan1za annabosman aessedai101 francoisoberholzer arrchana salihu krharrison robinschnetler marde-helbig gerhardsmit aeranginkaman cravingoxygen umaimsaleh dannypretorius salihu1 chirodraw gitter-badger davidbreetzke kaleb-himes sobolsigizmund avanwyk datascibox sylvanhuang white-chen mewh dewalddejager swapon15 sekhar2017 devbib gregoryaustin jecs89 stefanvds vector67 kyleerwin brendan-bath nafyssa danish1681 mattlk13 shishaolong xuzhou911 nambuntu sharlottemanganye

cilib's Issues

ErrorMeasurement doesn't work

ErrorMeasurement doesn't work because getError() is not yet implemented. For an example, see FunctionMinimisationProblem's getError() method:

@Override
public double getError(Type solution) {
    throw new UnsupportedOperationException("No implementation yet.");
}

Fitness evaluation could be within RVar?

Not sure if this is the norm, but it is conceivable that a fitness evaluation could result in the use of randomness.

In that case, the fitness evaluation should change from:

NonEmptyList[A] => Objective[A]

NonEmptyList[A] => RVar[Objective[A]]

Expand website to cater for blog entries

It'll be quite handy to have a blog section on the site. It should be simple to add.

Archive size cannot be changed in simulator

The size of the archive does not change when set in the xml because the archive is thread local and it is set up in a different thread than the one from which it runs. This results in the archive being overwritten by a default archive in the running thread.

Current syntax is magic

The current syntax for Step, StepS and algorithm usage seems a bit magical - can we simplify it?

GD seems to have a subtle problem (stagnation)?

Its been reported that the GD in the NN seems to be breaking in some subtle way. The learning seems to stagnate. A user recoded the GD as a test and his implementation seems to workaround the problem he was having regarding the stagnation.

Not sure if this is valid or not - waiting on some code.

Explore alternate `Iteration` implementation

The iteration level of the algorithm computation probably needs to be looked at again.

The current implementation is awkward and a simplification should be possible. This might mean that changes would be needed to be applied to Step as well. Free+Coyneda seems possible, but not sure if it's a direction that should be pursued?

Move algorithm-specific behaviour from entity to the relevant algorithms.

Entities should be pure data structures and behaviour should be purely determined by the iteration strategies. Ideally, there can only be one Entity class.

MVar / IORef or similar for algorithm running state?

Should the algorithm running state rather not be directly placed into the algorithm via some kind of locally type safe, mutable variable akin to IORef / MVar or similar?

This could cleanup the GCPSO and other related algorithms? Needs investigation.

Use monocle lenses

Use Monocle lenses

Sample algorithm usage in examples on website

The website needs some explanation of the samples, to guide the user along.

Verify hypothesis test for RNG distributions

The tests perform a hypothesis test using Anderson-Darling and this sometimes fails - not sure why.

Should double check the implementation to make sure that there is not a small error that is the cause of these failures.

Add Estimation of Distribution Algorithms (EDAs)

Estimation of Distribution Algorithms are stochastic optimisation techniques that take a model-based approach in contract to a population-based approach of genetic algorithms. This is done by building probabilistic models of the most promising candidate solutions. This model is then sampled for more candidate solution and the model is updated/refined iteratively. The model represents the probability distribution of candidate solutions where the probabilities are proportional to the quality of the solution. For a great survey of these algorithms, see this paper.

I would like to merge the following two algorithms into CIlib:

Univariate Marginal Distribution Algorithm (UMDA)
Stochastic Hill-Climbing with Learning by Vectors of Normal Distributions (SHCLVND)

UMDA uses a univariate model and discrete variables, specifically binary strings. SHCLVND is uses real-valued vectors and all variables are assumed to be independent.

Future algorithms to add:

Population-Based Incremental Learning (PBIL)
Extended Compact Genetic Algorithm (ECGA)
Mixed Iterated Density Estimation Evolutionary Algorithm (mIDEA)

I have some questions:

Should the EDAs go in their own package, such as cilib.eda? They are relatively similar to GAs so I'm not sure whether to put them in the cilib.ga package.
How should discrete problems be handled? The position of an Entity is declared as the Spire Numeric type so I am wondering how to support bit vectors, especially the scodec-bits library that @gpampara suggested I use

Thank you!

Remove CIlibIOException or extend RuntimeException

There is no clean method for handling checked exception, CIIOException should be removed or made to extend RuntimeException.

Migrate to scalaz 7.2

Currently, updating the dependency results in a massive scalac crash, when compiling the Heterogenous code.

Will need to isolate and determine why.

Documentation should publish forcefully?

The current documentation website is updated after commits to the main branch. This is fine, but is creating many commits, possibly pointlessly. The tut process will result in the examples always having the same output but the reported memory addresses for the instance references will change.

It would be better to force push this branch, as maintaining the history is not really relevant.

DSL to replace XML files

This issue is more like a meta-issue where various aspects for the 1.0 release will be noted.

Replace XML with DSL. Scala is the choice for the embedded DSL.
Update distribution for remote execution.

Extend the Fitness type to have the notion of a "Penalty" type

Currently the Fitness hierarchy is defined to have either a MinimizationFitness, MaximizationFitness or an InferiorFitness.

The MinimizationFitness and MaximizationFitness should be removed and replaced with a simple ValidFitness or some such because the optimization type is maintained using the Optimization abstraction. As a result, this should change the Fitness types to be something like the following:

public abstract class Fitness {
     private class Valid extends Fitness {...}
     private class Penalty extends Fitness {...}
     private class Invalid extends Fitness {...}
     ....
}

Where is the documentation?

The cilib site doesn't seem to be working:

Move package names away from sourceforge

We have long since moved away from net.sourceforge.

All packages should be renamed to net.cilib

Rework `exec` - Runner API

The Runner API is rather horrible. It needs a rework into a structure that more clearly defines what we are really wanting. The current code was written just to get something working, but it is not nice.

Generalized Entity creation functions

Some helper functions to ease the creation of Entity instances for use in algorithms.

This would mean removing createPosition (in Entity.scala) and friends to a better location and generalizing them more.

FFA shouldn't be using DescendingFitnessComparator

If I understand the code correctly, this will always return the worst solution in getBestSolution and getSolutions.

Fix the tests for MetricSpace

Currently the laws for metric space are being tested, with the exclusion of euclidean due to what seems to be IEEE numbers failing.

Tests need to be corrected and added:

Fix laws for metric spaces
Tests for monad instance
Tests for semigroup instance?
Tests for profunctor instance?

Updated PSO implementations

Update the implementations for the following:

Modularize project further

The core project is stabilizing now, barring one issue. Once that issue is corrected, the PSO / GA etc should be extracted into separate modules so that users can pull in only what is needed.

Define an Algorithm type?

Right now, algorithms are defined using the function signature:

List[Entity[S,A]] => Entity[S,A] => Step[S,A]

Would introducing another type to represent that structure be a good idea?

Additional examples of PSOs

Additional example files are needed for:

Memory-leak-like behaviour.

Memory usage gradually increases until it runs out of memory.
Example files causing the problem:
https://dl.dropboxusercontent.com/u/21116601/nntraining.xml
https://dl.dropboxusercontent.com/u/21116601/twoSpiral.arff

In this example, the first sim works fine. The second one shows the problem.

Data race on Simulator.progress

Hello,

I'm a phd student at University of Illinois working on IteRace (https://github.com/cos/IteRace), a tool for finding concurrency bugs in parallel programs. I'm using IteRace on Cilib and found a data race in Simulator on the update of the progress field:

The progress fields holds an unsynchronized HashMap:

this.progress = new HashMap<Simulation, Double>();

The notifyProgress method accesses the progress field and can be accessed from multiple threads.

    private synchronized void notifyProgress() {
        double ave = 0;
        for (Double tmp : progress.values()) {
            ave += tmp.doubleValue();
        }

        ave /= progress.size();

The method is synchronized so all is well so far. The problem appears in the unsynchronized updateProgress method which can also be accessed from multiple threads:

    void updateProgress(Simulation simulation, double percentageComplete) {
        progress.put(simulation, percentageComplete);
        notifyProgress();
    }

I see that the updateProgress method is only accessed when an Algorithm has finished or completed, but multiple algorithms are executed by the ExecutorService in parallel, so it can still race. If the algorithms are relatively long-running, the data race won't reveal itself easily, but it is still there.

An easy fix would be to synchronize the updateProgress instead or in addition to notifyProgress. Alternatively, if you want finer-grained locks, you could wrap the HashMap using Collections.synchronizedMap and synchronize on the wrapped collection during the notifyProgress iteration:

this.progress = Collections.synchronizedMap(new HashMap<Simulation, Double>());
...
synchronized(progress) {
        for (Double tmp : progress.values()) {
            ave += tmp.doubleValue();
        }

        ave /= progress.size();
}

de.xml no longer works.

Tried running de.xml using the cilib-simulator-assembly-0.9-SNAPSHOT.jar simulator and receive the following error.

Starting simulation 1 of 6.
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException
    at net.sourceforge.cilib.simulator.Simulator.execute(Simulator.java:122)
    at net.sourceforge.cilib.simulator.SimulatorShell.execute(SimulatorShell.java:87)
    at net.sourceforge.cilib.simulator.Main.main(Main.java:50)
Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at net.sourceforge.cilib.simulator.Simulator.execute(Simulator.java:116)
    ... 2 more
Caused by: java.lang.NullPointerException
    at net.sourceforge.cilib.ec.EC$1.f(EC.java:74)
    at net.sourceforge.cilib.ec.EC$1.f(EC.java:71)
    at fj.data.List.map(List.java:255)
    at net.sourceforge.cilib.ec.EC.algorithmInitialisation(EC.java:70)
    at net.sourceforge.cilib.algorithm.AbstractAlgorithm.performInitialisation(AbstractAlgorithm.java:98)
    at net.sourceforge.cilib.simulator.Simulation.init(Simulation.java:50)
    at net.sourceforge.cilib.simulator.Simulation.run(Simulation.java:59)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

Genetic algorithm / Differential evolution / similar

Add the needed algorithmic components to cater for:

Genetic Algorithms
Differential Evolution

What is the difference between Guide and Selection?

Both Guide and Selection seem to be doing very similar things. Could we define them as the same or perhaps one in terms of the other?

type Selection[A] = (List[A], A) => List[A]
type Crossover[A] = NonEmptyList[Position[A]] => Step[A,NonEmptyList[Position[A]]]
type Guide[S,A]   = (List[Particle[S,A]], Particle[S,A]) => Step[A,Position[A]]

Additionally, is it true that a Crossover will always result in a resulting Entity?

Entity classes should all be immutable

The number of issues related to not having the entity classes immutable is insane. We should move to using immutable entity instances

Prefer lenses for accessing Entity details

Right now the Entity is essentially a Tuple2. This should become a case class, which would allow us to use lenses to cleanly get to portions of the entity without needing to traverse the object graph at every point where some Entity detail is needed.

Sigmoid active range should account for steepness

The get(Lower/Upper)ActiveRange range methods in Sigmoid does not take steepness account and is only technically correct if the steepness is 1.0.

Assembly Build Error

On Windows some users experience the following error after entering the ''sbt assembly'' command:

[error] C:\Users\User\.ivy2\cache\asm\asm-analysis\jars\asm-analysis-3.3.1.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.ivy2\cache\asm\asm-tree\jars\asm-tree-3.3.1.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.ivy2\cache\asm\asm-util\jars\asm-util-3.3.1.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.ivy2\cache\asm\asm\jars\asm-3.3.1.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.ivy2\cache\com.google.code.findbugs\jsr305\jars\jsr305-1.3.9.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.ivy2\cache\com.google.guava\guava\jars\guava-11.0.1.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.ivy2\cache\org.functionaljava\functionaljava\jars\functionaljava-3.1.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.ivy2\cache\org.parboiled\parboiled-core\jars\parboiled-core-0.11.0.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.ivy2\cache\org.parboiled\parboiled-java\jars\parboiled-java-0.11.0.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.ivy2\cache\org.scalaz\scalaz-core_2.9.2\jars\scalaz-core_2.9.2-6.0.4.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.sbt\boot\scala-2.9.2\lib\scala-compiler.jar:META-INF/MANIFEST.MF
[error] C:\Users\User\.sbt\boot\scala-2.9.2\lib\scala-library.jar:META-INF/MANIFEST.MF

A temporary solution is to add the following to the end of the build.sbt file located in the simulator folder:

(mergeStrategy in assembly) <<= (mergeStrategy in assembly) { (old) => {
    case n if n.startsWith("META-INF") => MergeStrategy.rename
    case x => old(x)
}}

Add Cooperative PSO and Variants

Implement the Cooperative Particle Swarm Optimisation (CPSO-S) algorithm by Engelbrecht and van den Bergh.

Some variants to implement as well:

CPSO-Sk
CPSO-Rk
CPSO-Hk

Issues/Thoughts:
Sub-swarms are a commonality amongst many different cooperative algorithms and niching/speciation approaches. This should perhaps be supported by the library.

Remove getClone() from source

The getClone() is an effect of the bean way of writing java, resulting in zero benefit for a large amount of pain. It should be removed with prejudice.

WIP: Rotated function wrapper

Using breeze we can rotate functions like this:

def rotated(f: (Seq[Double]) => Option[Double], dim: Int) = {

  // create rotation matrix using QR factorisation
  val rotation = qr.justQ(DenseMatrix.rand(dim, dim))

  // call wrapped function with rotated input
  (x: Seq[Double]) => f((rotation * DenseVector(x.toArray)).toArray)
}

A few questions:

DenseMatrix.rand is used in this example. This needs to use our RNG. We can create a random matrix with DenseMatrix.tabulate(dim, dim){ (Int, Int) => T }, but I'm not sure how to do it in order to end up with an RVar.
Is it possible to instantiate the rotation matrix lazily, such that it can infer the dimension the first time it is called?
How does the rotation affect the domain of the function? (#221 defines domains for certain functions. It makes sense for functions that are only defined for a specific domain, however this is probably not the best way to do it)

Slowness observed in change to use spire.Interval

When the custom Interval code was removed in favor of the the Interval from spire, it seemed to have slowed down the general execution of algorithms. This needs to be compared and benchmarked in order to find the reason for the slow-down

Add Set-Based PSO

Implement the Set-Based PSO (SBPSO) algorithm by Langeveldt and Engelbrecht.

Both the position and velocity of particles consist of sets. What these sets contain is not important as long as they can be identified uniquely, eg. Position can be a set of 3-tuples of a string, an integer and a double. Velocities are defined as a set of pairs where each pair consists of an operation (Either removal or addition) and an element (The same type as the elements of the position set). The search space is defined as the set of all possible elements (The whole universe) and all positions are an element in the power set of the universe. Only the fitness function cares about what the elements represent - The rest of the algorithm only tests for equality of elements to perform operations such as addition and difference.

Support will have to be added to be able to support these data types.

Should Positions maintain their size?

Should the Position maintain its size on the type level? This is somewhat like the Sized collection idea from shapeless.

There are several benefits, but I'm sure that there are problems associated with this as well. What about dynamically sized positions (like in some GA implementations)?

Prepare testing generators and arbitrary instances

There is some duplication in the CIlibArbitrary class - needs to be cleaned up and simplified

Extract iteration strategies from algorithms

Looking at PSO / EC / FF / etc they all have the notion of a synchronous and asynchronous iteration.

In the PSO its formalized as sync and async, whereas in the case of EC it is termed 'generational' (sync) and 'steady-state' (async)

It would be a good idea to separate the iteration scheme from the algorithm.

i.e: Simulation -> iteration scheme -> algorithm

GA <crossoverStrategy/> tag not working

I tried running a GA simulation and the crossoverStrategy tag does not seem to be working. Is this a bug or am I doing something wrong? Here is my XML of the algorithms tag:

<algorithm id="ga" class="ec.EC">
        <iterationStrategy class="ec.iterationstrategies.GeneticAlgorithmIterationStrategy">
            <crossoverStrategy class="entity.operators.crossover.UniformCrossoverStrategy"/>
            <mutationStrategy class="entity.operators.mutation.GaussianMutationStrategy">
                <mutationProbability class="controlparameter.ConstantControlParameter" parameter="0.3" />
            </mutationStrategy>
        </iterationStrategy>
        <initialisationStrategy class="algorithm.initialisation.ClonedPopulationInitialisationStrategy">
            <entityNumber value="50"/>
            <entityType class="ec.Individual"/>
        </initialisationStrategy>
        <addStoppingCondition class="stoppingcondition.MeasuredStoppingCondition" target="2000"/>
    </algorithm>

Expand expected pre-defined lenses

Lenses to get fitness values etc from positions - after the changes introducing Objective this was missed for update.

Auto deploy of docs/site

Now that the management of the site building process is finally working, we should add it to the travis build process so that it will auto deploy the site on successful build (perhaps on tags?) so that the site is always up to date without manual intervention.

Implement Lambda Gamma feedforward neural networks.

Extension of NN framework to support lambda gamma FFNNs.

Time varying iteration schemes

Iteration schemes to allow for iteration based varying algorithm parameter values.