epistimio / orion Goto Github PK
View Code? Open in Web Editor NEWAsynchronous Distributed Hyperparameter Optimization.
Home Page: https://orion.readthedocs.io
License: Other
Asynchronous Distributed Hyperparameter Optimization.
Home Page: https://orion.readthedocs.io
License: Other
an example is tests/functional/branching/test_branching.py::test_new_algo_not_resolved
it passes when running tox -e py36 tests/functional/branching/test_branching.py::test_new_algo_not_resolved
but fails when testing directly with pytest tests/functional/branching/test_branching.py::test_new_algo_not_resolved
(or similarly python setup.py test --addopts "tests/functional/branching/test_branching.py::test_new_algo_not_resolved"
)
this makes me think that something is not getting set up correctly with pytest, so fyi just test locally with tox for functional tests
full stack trace for pytest
:
================================== short test summary info ==================================
FAIL tests/functional/branching/test_branching.py::test_new_algo_not_resolved
========================================= FAILURES ==========================================
________________________________ test_new_algo_not_resolved _________________________________
init_full_x = None
def test_new_algo_not_resolved(init_full_x):
"""Test that new algo conflict is not automatically resolved"""
name = "full_x"
branch = "full_x_new_algo"
with pytest.raises(OSError) as exc:
orion.core.cli.main(
("init_only -n {name} --branch {branch} --config new_algo_config.yaml "
> "./black_box.py -x~uniform(-10,10)").format(name=name, branch=branch).split(" "))
test_branching.py:428:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../src/orion/core/cli/__init__.py:39: in main
orion_parser.execute(argv)
../../../src/orion/core/cli/base.py:71: in execute
function(args)
../../../src/orion/core/cli/init_only.py:39: in main
ExperimentBuilder().build_from(args)
../../../src/orion/core/io/experiment_builder.py:239: in build_from
experiment = self.build_from_config(full_config)
../../../src/orion/core/io/experiment_builder.py:270: in build_from_config
experiment.configure(config)
../../../src/orion/core/worker/experiment.py:385: in configure
experiment._instantiate_config(config)
../../../src/orion/core/worker/experiment.py:542: in _instantiate_config
self.algorithms = PrimaryAlgo(space, self.algorithms)
../../../src/orion/core/worker/primary_algo.py:39: in __init__
super(PrimaryAlgo, self).__init__(space, algorithm=algorithm_config)
../../../src/orion/algo/base.py:110: in __init__
space, **subalgo_kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'orion.algo.base.OptimizationAlgorithm'>, of_type = 'gradient_descent'
args = (Space([Real(name=/x, prior={uniform: (-10, 20), {}}, shape=(), default value=None)]),)
kwargs = {'learning_rate': 0.0001}, inherited_class = <class 'conftest.DumbAlgo'>
error = "Could not find implementation of BaseAlgorithm, type = 'gradient_descent'\nCurrently, there is an implementation for types:\n['random', 'dumbalgo']"
def __call__(cls, of_type, *args, **kwargs):
"""Create an object, instance of ``cls.__base__``, on first call.
:param of_type: Name of class, subclass of ``cls.__base__``, wrapper
of a database framework that will be instantiated on the first call.
:param args: positional arguments to initialize ``cls.__base__``'s instance (if any)
:param kwargs: keyword arguments to initialize ``cls.__base__``'s instance (if any)
.. seealso::
`Factory.typenames` for values of argument `of_type`.
.. seealso::
Attributes of ``cls.__base__`` and ``cls.__base__.__init__`` for
values of `args` and `kwargs`.
.. note:: New object is saved as `Factory`'s internal state.
:return: The object which was created on the first call.
"""
for inherited_class in cls.types:
if inherited_class.__name__.lower() == of_type.lower():
return inherited_class.__call__(*args, **kwargs)
error = "Could not find implementation of {0}, type = '{1}'".format(
cls.__base__.__name__, of_type)
error += "\nCurrently, there is an implementation for types:\n"
error += str(cls.typenames)
> raise NotImplementedError(error)
E NotImplementedError: Could not find implementation of BaseAlgorithm, type = 'gradient_descent'
E Currently, there is an implementation for types:
E ['random', 'dumbalgo']
../../../src/orion/core/utils/__init__.py:142: NotImplementedError
----------------------------------- Captured stderr setup -----------------------------------
Found section 'debug' in configuration. Experiments do not support this option. Ignoring.
Found section 'config' in configuration. Experiments do not support this option. Ignoring.
Found section 'auto_resolution' in configuration. Experiments do not support this option. Ignoring.
Found section 'algorithm_change' in configuration. Experiments do not support this option. Ignoring.
Found section 'user_args' in configuration. Experiments do not support this option. Ignoring.
----------------------------------- Captured stderr call ------------------------------------
Found section 'debug' in configuration. Experiments do not support this option. Ignoring.
Found section 'config' in configuration. Experiments do not support this option. Ignoring.
Found section 'auto_resolution' in configuration. Experiments do not support this option. Ignoring.
Found section 'algorithm_change' in configuration. Experiments do not support this option. Ignoring.
Found section 'branch' in configuration. Experiments do not support this option. Ignoring.
Found section 'user_args' in configuration. Experiments do not support this option. Ignoring.
1 failed in 0.62 seconds
I found a need to use an immediate count
method for a specific collection in a database, in order to expose an API from Experiment
that checks whether an experiment has reached terminating condition implied by attribute max_trials
. I foresee that it will be useful in Worker
's implementation. See tsirif/orion@685ee9ac and line 123.
Relevant issues: #21
Discussion from #36.
X:
Based on
trial_to_tuple
code, only hyper-parameter values are passed to thescore_handle
, not the results? Thescore_handle
need to keep internally results information to match hyper-parameters with them?
C:
I restrict that we should score a predictive performance scalar for trials which have not been evaluated yet. Algorithm will score based on what it has seen already from observe.
X:
score_handle
could receive suspended or interrupted trials, which means there could be measurements available for the scoring functions. An example where this is necessary is FreezeThaw.
C:
So what kind of interface do you propose for this? Consider that algorithms speak Python data structures and Numpy only. I move this to an issue, because we should study FreezeThaw and other possible client-server stuff, and how are we going to save online measurements and replies (I suggest reusing Trial
objects and trials database).
Also, we should have in mind that future exploitation by RL projects, like BabyAI Game, is possible. So that an environment (user script-client) could be used to train asynchronously and distributed agents (algorithm-server). A static trial (set of hyperparameters usually) means the training environment's variation factors and a game instance's initial state (this is params
); results
is possibly episode's return. A dynamic trial means an observation tensor from the environment + a reward scalar + possibly a set of eligible actions (this is results
), and an eligible action chosen as a response of sensing this information (this is params
).
API is not available on orion.readthedocs.io, yet it should based on index.rst.
Excerpt:
.. toctree::
:maxdepth: 2
:caption: Code Reference
reference/modules
This is because sphinx-api is normally called on command line. See tux's commands for documentation compilation:
sphinx-apidoc -f -E -e -M --implicit-namespaces --ext-doctest -o docs/src/reference src/orion
sphinx-build -E -W --color -c docs/src/ -b html docs/src/ docs/build/html
sphinx-build -E -W --color -c docs/src/ -b man docs/src/ docs/build/man
readthedocs does not have an option for running particular commands, we need to embed sphinx-apidoc inside docs/src/conf.py
. See ref
I didn't manage to make it work yet. I get a weird error:
Running Sphinx v1.7.2
usage: sphinx-build [OPTIONS] -o <OUTPUT_PATH> <MODULE_PATH> [EXCLUDE_PATTERN, ...]
sphinx-build: error: the following arguments are required: -o/--output-dir
X:
The support for Real, Integer and Categorical by random() is very confusing. If no features are added for Real and Integer by random(), I would support only Categorical. Also, if enum and random support identical features for Categorical, we should drop one or the other. I think random() is a better fit because of the probabilities argument.
C:
I have no argument against this. Also, I agree that random should preferably mean one thing... Constraints on polymorphism.
Fix metaopt.utils._appdirs
to be able to return correct OS specific application directories for the case that Python lives in a virtual environment.
For *nix, it works perfectly.
For OS X: A fix code for AppDirs.site_data_dir
is provided at tsirif/orion@095b0d7, in branch fix/appdirs
.
Implement and check correctness for:
This is a bug, if we consider our software to be installable in these OS as well.
As a note: site directory should be contained within the active (which Python runs at a time the _appdirs
module) Python's install prefix directory. So that, for the data directory, this would be sys.prefix
+share[/appname[/version]]
mostly and if sys.prefix
is the system site, then choose OS specific data dir.
Looks like a wrong link in the docs: http://orion.readthedocs.io/en/latest/installing.html
I am referring to this issue raised on meeting 2017/11/30 about the following concern:
MongoDB queries and returned objects have a unique identifier set in them by the MongoDB itself, without explicitly programming so. We find using a particular ID of this kind useful and thus we should make sure that this special key is framework-agnostic. Current MongoDB implementation has string '_id'
as the key that denotes this unique document ID per collection.
I propose fixing this key _id
as the standard key to be expected in our interface, and all of the other implementation would convert [to] their own, back & forth to that particular key '_id'
.
That also, raises the issue of how to test this interface agreement. I think that issue #23 addresses this. There should be a test for each public interface {read
, write
, remove
}:
Let A and B be two instances of the Dimension class defined as follow :
Name | Type | Prior |
---|---|---|
x | Real | uniform(0, 10) |
x | Real | uniform(0,1) |
While it is evident that these two Dimensions are not equal (since they do not allow values on the same interval), the current __eq__
function for Dimension comparison returns True. This comes from the fact that the DimensionBuilder uses the expression passed through the command line (or a config file), for example, -x~uniform(0,10)
, and passes it through an eval
call which uses the values inside the parenthesis as arguments to the function related to that distribution.
Unfortunately, this operation does not preserve the expression
string.
From comment here
Two trials with identical parameters could have different results. We can't assume the user's script is deterministic. We could, however, add a seed to the attributes of the Trial and pass it through an environment variable to the user's script. The user could seed the script using it, making it deterministic for a given pair of (params, seed).
Trials should be assigned random seeds, which are passed to user script using exposed environment variables. There should also be an option to specify if an experiment is noisy or deterministic. In the deterministic case, there would be no seed, or a single seed for the entire experiment.
As in (or similar syntax)
--model-name~model_'trial.has_name'
Some procedures might require atomic operations on many rows. For that we would need a lock on the database.
Regarding MongoDB, we can read in the doc that it supports many level of locks.
MongoDB uses multi-granularity locking [1] that allows operations to lock at the global, database or collection level, and allows for individual storage engines to implement their own concurrency control below the collection level
I don't know if global would be enough.
Here is a code snippet for mongodb from @dendisuhubdy
def lock(self):
"""Lock a database"""
try:
lock = self.conn.fsync(lock=True)
except pymongo.errros.ConnectionFailure as e:
self.logger.debug("Could not lock a database")
raise_from(DatabaseError("Could not determine lock due to connection."), e)
return lock
def unlock(self):
""" Unlocking a database """
try:
unlock = self.conn.unlock()
except pymongo.errors.ConnectionFailure as e:
self.logger.debug("Could not determine unlock due to connection")
raise_from(DatabaseError("Could not determine unlock due to connection."), e)
return unlock
When a query is issued to MongoDB, where we ask for specific <name, user>, recent trials, completed trials, sorted top 10 result, mongo needs to iterate through the entire collection. If we specify indexes on <name, user> for instance, it can use its built indexes to quickly fetch documents matching <name, user>.
The choice of indexes is basically a compromise between efficiency and memory consumption.
Note: Specifying <experiment name, user> as a unique index will fix a race condition bug in the experiment creation.
Proposal: add a new subcommand orion status -n <name>
. It should output the experiment information: best objective so far, total number of trials, number of trials finished, etc.
Discussion from #44 (comment)
Recopied below for convenience
X:
@tsirif Do you think observe and judge could be merged? Otherwise I think they should be next to each other with a clear distinction between them in the doc. Beside the fact that one is on the results and the other on intermediate results, what is the difference?
The essential differences are only of purpose, not of structure.
So that:
observe
takes points
and results
and returns None; however suggest
is called to return new samples.judge
takes point
and measurements
and possibly returns serializable data.Having said that, it can be the case that whatever part of the algorithm, is expected to act dynamically with a trial under evaluation, could be thought of as a subclass of BaseAlgorithm
and do exactly the same stuff as it.
So an alternative to what exists right now could be:
DynamicAlgo
inherits BaseAlgorithm
and implements default reactions, also mixes in property should_suspend
and possibly score
. In its init parameters needs necessarily as positional arguments a Space
object (as a BaseAlgorithm
) and a Channel
object (I will come to that earlier).
PrimaryAlgo
inherits DynamicAlgorithm
. It holds necessarily a BaseAlgorithm
and possibly a DynamicAlgorithm
. It is delegated also to route calls to observe
/suggest
correctly to its components, based on the question "is a trial currently active?".
An appropriate Channel
object exposed by implementations of BaseAlgorithm
and it fulfills an API proposed by a concrete implementation of a DynamicAlgo
.
Example:
FreezeThaw
implements and inherits DynamicAlgo
. It also exposes an API through an abstract class to-be-implemented FreezeThaw.Channel
. FreezeThaw
necessarily owns an object from this Channel
.
A certain algorithm, implementation of BaseAlgorithm
, wants to expose its state information in a manner which is useful to FreezeThaw
. The developer of this algorithm expresses compatibility with FreezeThaw
by implementing the interface class FreezeThaw.Channel
. A concrete instance is retrieved by calling a BaseAlgorithm
's property (perhaps BaseAlgorithm.channel
) which will conditionally instantiate on the optional existence of FreezeThaw
algorithm on the system (poll for DynamicAlgo.__subclasses__()
or a corresponding Factory class for its types!)
We discuss possible ways to organize the source code and distribute our Python packages.
First of all, a distinction must be made between various terms referring to the development and distribution process to avoid confusion. All following definitions come from either PEP, Python Reference or Python Glossary:
__path__
attribute.
__init__.py
file.__init__.py
file. Namespace packages are a mechanism for splitting a single Python package across multiple directories on disk. In current Python versions, an algorithm to compute the packages __path__
must be formulated. With the enhancement proposed in PEP 420, the import machinery itself will construct the list of directories that make up the package.[1] Differences between namespace packages and regular packages, from PEP 420
[2] Packaging namespace packages from PyPA's Python Packaging User Guide
[3] Namespace packages, from setuptools
[4] Import system, packages, regular packages, namespace packages from Python Reference
[5] PEP 328 -- Imports: Multi-Line and Absolute/Relative
[6] PEP 366 -- Main module explicit relative imports
[7] PEP 420 -- Implicit Namespace Packages
User wants to use an algorithm:
BaseAlgorithm
implementation corresponding to one's preferred algorithm.Developer wants to develop an algorithm:
metaopt.algo
. This module contains a class which interfaces BaseAlgorithm
.setup.py
.OptimizationAlgorithm
. This is a way that a class can be made discoverable to metaopt.core
package.Software ecosystem is organized is under package name metaopt
which is a namespace package, composed by metaopt.core
, metaopt.algo
and metaopt.client
subpackages.
metaopt.core
: This regular package contains packages and modules which implement core functionality and console scripts.metaopt.algo
: Self-contained namespace package which contains base.py
module, space.py
module and any possible algorithm implementation.metaopt.client
: Regular package with helper code to be used from native Python user scripts to communicate with parent process from metaopt.core
. Contains function report_results
and in the future it will also contain functions for online statistics reporting during an evaluation of a trial, as well as receiving (possibly) response from parent process (optimization algorithm).Optimization algorithms ought to interface BaseAlgorithm
class contained in metaopt.algo.base
module and advertise their implementations as entry points in setup.py
of their distribution under group name OptimizationAlgorithm
.
Implementation note: OptimizationAlgorithm
(code) is the name of the class which subclasses BaseAlgorithm
and is created by metaclass Factory
(code). Factory
functionality has as follows:
pkg_resources.iter_entry_points
(from package setuptools
) to find any advertised entry points under the group name which coincides to be the Factory
-type subclass' name (in this case OptimizationAlgorithm
)cls.__base__.__subclasses__()
.Factory
is called to create another class, it checks __call__
parameter of_type
against known subclasses names and if found, it calls the corresponding class and returns an object instance.This Factory
pattern is being reused 3 times in total in current metaopt's code. The corresponding names of the Factory
-typed classes are: Database
in metaopt.core.io.database
package, Converter
in metaopt.core.io.convert
module, and OptimizationAlgorithm
in metaopt.algo.base
module.
[1] Metaclasses from Python Reference.
[2] A nice blogpost with UML-like diagrams to understand Python data model structure and flow (based around metaclasses).
[3] Another nice blogpost to complement setuptools' reference about entry points.
The following proposals discuss possible solutions to the problem of managing the code source and distributing software of BaseAlgorithm
implementations which the core developing team develops. From the state of affairs above, it is apparent that if an external contributor or researcher wishes to extend software with an algorithm of one's own, but do not contribute it, it can be achievable easily and without having knowledge other than what is contained in namespace package metaopt.algo
.
In addition, in any of the following schemes, it makes sense - although the following statement is up for discussion if needed - that any 'trivial' implementation (e.g. metaopt.algo.random
module) that has dependencies on a subset of metaopt.core
dependencies to reside in the "core" distribution.
Abstract: This scheme suggests that extensions should be grouped according to their external (to metaopt.core
software) dependencies and be independently developed and distributed.
Abstract: This scheme suggests that extensions should be grouped analogously to the first scheme, be independently developed but distributed centrally through the means of git submodule.
Abstract: This scheme suggests that extensions, which have internal dependencies and are core, should be placed in a central package (i.e. metaopt.algo
?) and contributions which perhaps have external dependencies to a separate directory (e.g. contrib
). Any code is developed in the same git repository and published under the same Python distribution.
[1] Creating and Discovering Plugins from PyPA's Python Packaging User Guide
We have test_connection_with_uri
and test_overwrite_uri
, but there is no test_connection_with_args
.
Perhaps it will be useful in the future to have an option in database interface for returning or acting-on only the most recent matched documents.
An attempt lies at tsirif/orion@cda014b, on branch feature/db/most_recent
.
Right now DB.read() returns a list if there is many elements, a dictionary if there is only one and None if there is nothing found. I find this behavior very inconvenient for any code that tries to read a list of documents.
list_of_docs = database.read("collection_name", {})
if not isinstance(docs, list) and docs is not None:
list_of_docs = [docs]
elif docs is None:
list_of_docs = []
# Can now work with the list
In my opinion DB.read() should simply returns lists. It's easier to work with lists when we only want one item than it is to work with current behavior when we want a list.
When a trial is selected, it will completely overwrite the database row when setting status to reserved. However, there could be a race condition meanwhile and another process could have written the row already. To avoid this, the query in write() should ensure that the status is the one we get from the first read(). If the query fails, the method needs to retry another reservation. We can assume that trials' db row cannot change if status is not 'reserved', so such a query would be sufficient to ensure atomicity (if underlying's database code is something like find_and_update).
aknowledged = self._db.write('trials', selected_trial.to_dict(),
query={'_id': selected_trial.id,
'status': selected_trial.status})
if not aknowledged:
return self.reserve_trial(score_handle)
return selected_trial
Reasonale: Some algorithms operate only on ordered spaces, some others only on real spaces. So defined algorithms should have a class "property" to declare their own requirements on the parameter domain they are able to operate on. In response, metaopt.core
should define and implement Transformer
classes, which will be able to decorate Dimension
objects. metaopt.core.worker.primary_algo
implements the wrapper class which will poll the algorithm
attribute for requests
and will wrap the dimensions in given space
object during __init__
to fulfill them. Then, a space will be constructed with the transformed dimensions and this will be set to the algorithms. The original space is reserved.
This proposal is made having in mind that transformers' existence and functionality should be "hidden" from Experiment
class and also BaseAlgorithm
implementation. For Experiment
's and Producer
's side this is done by the existence of PrimaryAlgo
class. For BaseAlgorithm
's this is done by providing an agnostic interface through the Space
object.
Copying discussion from #83
T:
Q: Is the default value part ofTrial.Param
?
X:
Well, aTrial.Param
instance is basically adimension.sample()
turned into a param. So, if we have a default value,Trial.Param
is a very convenient and natural way to encapsulate such default value. Furthermore, forDimensionAddition
andDimensionDeletion
we need to add aTrial.Param
object to each trials, so using such encapsulation for default value is very very convenient.
T:
I agree. This means that default_value should be a slotted attribute ofTrial.Param
?
Context: #36 (comment)
It's not clear to me how the experiment documents are structure based on the yaml file. What is references precisely?
I would not add this in the DB. It can be computed on the fly.
For workers and especially for ressources I cannot say. This part of the project is not clear enough for me now to think about the DB structure for them.
Regarding authorization, users are currently expected either to provide a compatible URL string in METAOPT_DB_ADDRESS
environmental variable which can contain username:password
credentials, or provide them in a moptconfig.yaml
file under key database: username: 'asdfa'
and database: password: 'asdfasfa'
. First one uses environmental variables, second one a plain text file in user's directory. Both of these can be considered unsafe.
We should discuss better ways of requesting user's credential for the database. Perhaps by using Python std module getpass
. Complications may arise on how to distribute the credentials to multiple processes in perhaps multiple nodes, which can be the general scenario. The reason is that there must be one MongoClient
(for instance) for each worker, all of these use the same options (and hence credentials), and we would like to request for the credentials once. Perhaps using an ssh mechanism for the multi-node case is necessary. I am not very familiar with the issues that can arise here.
So we should discuss and organize a solution that perhaps fits all scenarios, but is not enforced if things are kept in a single local node.
Issue has been raised by @nasimrahaman.
So last night I had an epiphany, I was thinking on how the brain works and there is so far no global loss function that our brain minimizes or if there is any there are only local losses related to each synaptic response.
So I thought of maybe something else, not evolutionary too. There has been a method in which the neuron itself fires given an input and say result in y
. Given another set of input it fires and it goes to y_1
which is the wrong firing output, which it should be at y_1'
, the \delta{y_1' - y_1}
would be adjusted in the next iteration.
So there are only two solutions to this, the weights are perturbed in a way that there are local adjustments to outputs or the firing or there is a sudden highway of information between two unconnected neurons. See http://neuronaldynamics.epfl.ch/online/Ch19.S1.html
In this case, I thought of scrapping backpropagation totally and try a new measure. Why don't we add weights as our hyperparameters and perturb it with Orรญon.
[Mumbling in my own thoughts]
Relative discussion.
We would like to support fetching multiple matched documents on a write (meant as an update) or remove, in an atomic way. Preferably using a single command, if the database framework supports it. Else, probably a locking mechanism should be devised and used (see #10).
A first approach,currently returning only one document and applying the update or remove on that, lies at tsirif/orion@ab159cf, on branch feature/db/atomic
Copying discussion from #83
T:
This should have been done earlier. It is useful fo thehash_name
property as well. Shall I right a__hash__
and change the implementation of thehash_name
? I think I should. This will be in a future PR.
X:
Yes sure if there is hashing that is certainly related to__eq__
.
The reason is that in #69, a unique string identifier per trial may be requested from the user to disambiguate among temporary local disk resource locations that may be used for resumability or logging. Implementation of Trial.hash_name
is suboptimal.
Out of the possible things that can be given through metaopt configuration procedure, the final configuration dictionary is going to have keys which correspond to existing Experiment
entities and keys which correspond to AbstractAlgorithm
instances.
An algorithm can be described as:
Constructors, representers, resolvers
in this)metaopt.utils.Factory
interface using a algorithm implementation identifier (like class name). This one will search for an implementation to instantiate using two mechanisms:
pkg_resources
entry points to select a registered interface (from a metaopt.algo extension) and check whether it is a proper subclass of the base class.All of these ways must be finally resolved by Experiment._sanitize_config
, which will instantiate and validate any objects that can be inferred from the configuration. It should not expect that all values in the nesteddict
will correspond to numericals, dicts, or class identifiers. It may be also objects instantiated immediately by YAML itself; in this case it should check whether they are truly instances of what they are supposed to be.
However, as we agreed in nested algorithm configurations (instead of dicts), I will have to create a way to recognize and instantiate recursively AbstractAlgorithm
objects from within AbstractAlgorithm.__init__
, using checks where it is appropriate and metaopt.utils.Factory
. This in turn kinda implies that all parameters expected by an implementation (as they can be different among implementations and as we want to minimize code repetitions or requirements for people implementing extensions) must be somehow exported from the inherited class to the base one. This can be done by using class variables [e.g. cls.params
for name declaration, cls.defaults
for their defaults, cls.paramtypes
for their types (optional), cls.description
for documentation (to enhance cls.__doc__
with parameter documentation)]; I am not considering cls.__slots__
here as many algorithms must hold an internal state in order to be functional. A decorator can be exported from metaopt.algo.base
that can facilitate declaring new parameters.
For an experiment, we should discuss how the values. All that it takes for a primary key of an experiment is the pair (name, user). So, this definitely will specify one Experiment and should be part of when specifying a referent experiment.
I propose that every implementation of algorithm should have a mechanism (general configuration
property) that creates the settings of an AbstractAlgorithm
object in a dictionary form. This will serve 2 different purposes:
Experiment.configuration
to serve a experiment's settings in dict form.A little note to be taken here is that AbstractAlgorithm.configuration
must recognize if some attributes or values are AbstractAlgorithm
instances and recursively call it. This is because: (a) convenient since algorithm objects can decorate themselves with other instances, (b) we agreed that an algorithm's complete specification in experiment's settings should be in dictionary form.
Have you considered to use http://argcomplete.readthedocs.io/en/latest/ ? This project looks very nice.
In the context of experiment version control trials generated by an algorithm may be biased by trials coming from parent or children experiments. To help identify such potential biases we should log in a new trial which trials was part of the algorithm's history at the time it generated it.
We could represent the log as a simple list of trial ids. However if there is thousands of trials then the size of this list will significantly out-weight the rest of the trial. This could use a significant amount of memory and slow down queries when the list of ids is selected.
Another solution is to represent the log as a tree of trial ids. A trial (1) would point to trial (2) which point to trial (3), meaning that trial (1) was created with both trial (2) and (3) in algorithm's history. Worst case scenario is a flat tree, which brings back to the list of ids, but it does not seam like a very likely situation, unless someone sample 1000 trials before starting executing them.
Some strange exception. What is happening there?
Before, I did orion init_only
on a different machine.
$ orion -vv hunt -n omni26
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/opt/user/conda/share/orion.core/orion_config.yaml.example'
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/etc/xdg/orion.core/orion_config.yaml'
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/opt/user/conda/share/orion.core/orion_config.yaml.example'
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/etc/xdg/orion.core/orion_config.yaml'
DEBUG:orion.core.io.experiment_builder:Creating mongodb database client with args: {'name': 'test', 'host': '....'}
DEBUG:orion.core.worker.experiment:Creating Experiment object with name: omni26
INFO:orion.core.io.experiment_builder:{'name': 'omni26', 'max_trials': inf, 'pool_size': 10, 'algorithms': 'random', 'database': {'name': 'test', 'type': 'mongodb', 'host': '...'}, 'debug': False, 'auto_resolution': False, 'algorithm_change': False, 'user_args': [], 'leafs': [], 'metadata': {'orion_version': 'v0.1.0', 'user': 'serdyuk'}}
INFO:orion.core.io.experiment_builder:{'name': 'omni26', 'max_trials': inf, 'pool_size': 10, 'algorithms': 'random', 'database': {'name': 'test', 'type': 'mongodb', 'host': '...'}, 'debug': False, 'auto_resolution': False, 'algorithm_change': False, 'user_args': [], 'leafs': [], 'metadata': {'orion_version': 'v0.1.0', 'user': 'serdyuk'}}
DEBUG:orion.core.worker.experiment:Creating Experiment object with name: omni26
DEBUG:orion.core.worker.experiment:Creating Experiment object with name: omni26
WARNING:orion.core.worker.experiment:Found section 'debug' in configuration. Experiments do not support this option. Ignoring.
WARNING:orion.core.worker.experiment:Found section 'auto_resolution' in configuration. Experiments do not support this option. Ignoring.
WARNING:orion.core.worker.experiment:Found section 'algorithm_change' in configuration. Experiments do not support this option. Ignoring.
WARNING:orion.core.worker.experiment:Found section 'user_args' in configuration. Experiments do not support this option. Ignoring.
WARNING:orion.core.worker.experiment:Found section 'leafs' in configuration. Experiments do not support this option. Ignoring.
Traceback (most recent call last):
File "/opt/user/conda/bin/orion", line 11, in <module>
sys.exit(main())
File "/opt/user/conda/lib/python3.6/site-packages/orion/core/cli/__init__.py", line 39, in main
orion_parser.execute(argv)
File "/opt/user/conda/lib/python3.6/site-packages/orion/core/cli/base.py", line 71, in execute
function(args)
File "/opt/user/conda/lib/python3.6/site-packages/orion/core/cli/hunt.py", line 53, in main
experiment = EVCBuilder().build_from(args)
File "/opt/user/conda/lib/python3.6/site-packages/orion/core/io/evc_builder.py", line 51, in build_from
experiment = ExperimentBuilder().build_from(cmdargs)
File "/opt/user/conda/lib/python3.6/site-packages/orion/core/io/experiment_builder.py", line 239, in build_from
experiment = self.build_from_config(full_config)
File "/opt/user/conda/lib/python3.6/site-packages/orion/core/io/experiment_builder.py", line 270, in build_from_config
experiment.configure(config)
File "/opt/user/conda/lib/python3.6/site-packages/orion/core/worker/experiment.py", line 410, in configure
final_config = experiment.configuration
File "/opt/user/conda/lib/python3.6/site-packages/orion/core/worker/experiment.py", line 349, in configuration
config[attrname] = attribute.configuration
AttributeError: 'str' object has no attribute 'configuration'
Sometimes the status of the trials may not be updated properly because of different kind of failures. The database should be inspected at some point (during loading of the experiment and trials maybe), and it should be automatically cure.
For example, if a trial status is reserved and last timestamp is 1 day ago, then the status of the trial should be automatically set to 'suspended'.
Unit tests currently cover mongodb://user:pass@url/database
only. We need to test many different ways.
For instance:
mongodb://url
mongodb://url/database
mongodb://url:port/database
mongodb://user:pass@url
mongodb://user:pass@url/database
mongodb://user:pass@url/database?someOptions=someValue?someOtherOption=SomeOtherValue
We wouldn't have to test all this if we simply had passed URI without extracting the parameters and passing them separately to MongoClient.
I don't think we need those tests right now but we need to keep that in mind, hence the issue.
Case: During initialization of an Experiment
object, we query the database for documents (configurations) with a specific (exp_name, user_name) tuple. Recently in #55 we proposed to make this tuple a key for the experiments
collection in database. This means that we should also do probably with the checks in lines 125-130 in metaopt.core.worker.experiment
.
if len(config) > 1:
log.warning("Many (%s) experiments for (%s, %s) are available but "
"only the most recent one can be accessed. "
"Experiment forks will be supported soon.", len(config), name, user)
config = sorted(config, key=lambda x: x['metadata']['datetime'],
reverse=True)[0]
Now it is sure that a single document will be returned from this query, under normal conditions. Should we remove the check? Or should we convert it to error and throw an exception?? Are there other choices, what do you think?
I think that the tests should be moved in a way that can test through fixture parameterization any possible AbstractDB
implementation. So that:
I propose to create a module-level fixture that creates a Database
(#18) object, based on a parameterized string from the list ['mongodb', '<future_implementation_name>']
.
Documentation should be revised and a website should be published containing at least the following pages:
BaseAlgorithm
availableBaseAlgorithm
and publishing it as a Python distributionAlso, write tox scripts to automate building and distributing docs.
Roadmap, changelog and contributors should probably be in root repo's directory as .rst
files.
Configure sphinx where needed. For example, use sphinx.ext.intersphinx
to link with numpy
and scipy
docs.
I already putted
export ORION_DB_ADDRESS=mongodb://user:pass@localhost
export ORION_DB_NAME=orion_test
export ORION_DB_TYPE=MongoDB
in my ~/.zshrc
I keep getting
=========================================================================================================================================================== ERRORS ===========================================================================================================================================================
_____________________________________________________________________________________________________________________________ ERROR collecting tests/functional/commands/test_insert_command.py ______________________________________________________________________________________________________________________________
tests/functional/commands/test_insert_command.py:8: in <module>
import orion.core.cli
src/orion/core/cli/__init__.py:15: in <module>
from orion.core.cli import resolve_config
src/orion/core/cli/resolve_config.py:80: in <module>
('ORION_DB_ADDRESS', 'host', socket.gethostbyname(socket.gethostname()))
E socket.gaierror: [Errno 8] nodename nor servname provided, or not known
____________________________________________________________________________________________________________________________________ ERROR collecting tests/functional/demo/test_demo.py _____________________________________________________________________________________________________________________________________
tests/functional/demo/test_demo.py:10: in <module>
import orion.core.cli
src/orion/core/cli/__init__.py:15: in <module>
from orion.core.cli import resolve_config
src/orion/core/cli/resolve_config.py:80: in <module>
('ORION_DB_ADDRESS', 'host', socket.gethostbyname(socket.gethostname()))
E socket.gaierror: [Errno 8] nodename nor servname provided, or not known
_______________________________________________________________________________________________________________________________ ERROR collecting tests/functional/parsing/test_parsing_base.py _______________________________________________________________________________________________________________________________
tests/functional/parsing/test_parsing_base.py:9: in <module>
from orion.core.cli import resolve_config
src/orion/core/cli/__init__.py:15: in <module>
from orion.core.cli import resolve_config
src/orion/core/cli/resolve_config.py:80: in <module>
('ORION_DB_ADDRESS', 'host', socket.gethostbyname(socket.gethostname()))
E socket.gaierror: [Errno 8] nodename nor servname provided, or not known
_______________________________________________________________________________________________________________________________ ERROR collecting tests/functional/parsing/test_parsing_hunt.py _______________________________________________________________________________________________________________________________
tests/functional/parsing/test_parsing_hunt.py:9: in <module>
from orion.core.cli import hunt
src/orion/core/cli/__init__.py:15: in <module>
from orion.core.cli import resolve_config
src/orion/core/cli/resolve_config.py:80: in <module>
('ORION_DB_ADDRESS', 'host', socket.gethostbyname(socket.gethostname()))
E socket.gaierror: [Errno 8] nodename nor servname provided, or not known
____________________________________________________________________________________________________________________________ ERROR collecting tests/functional/parsing/test_parsing_init_only.py _____________________________________________________________________________________________________________________________
tests/functional/parsing/test_parsing_init_only.py:9: in <module>
from orion.core.cli import init_only
src/orion/core/cli/__init__.py:15: in <module>
from orion.core.cli import resolve_config
src/orion/core/cli/resolve_config.py:80: in <module>
('ORION_DB_ADDRESS', 'host', socket.gethostbyname(socket.gethostname()))
E socket.gaierror: [Errno 8] nodename nor servname provided, or not known
______________________________________________________________________________________________________________________________ ERROR collecting tests/functional/parsing/test_parsing_insert.py ______________________________________________________________________________________________________________________________
tests/functional/parsing/test_parsing_insert.py:9: in <module>
from orion.core.cli import insert
src/orion/core/cli/__init__.py:15: in <module>
from orion.core.cli import resolve_config
src/orion/core/cli/resolve_config.py:80: in <module>
('ORION_DB_ADDRESS', 'host', socket.gethostbyname(socket.gethostname()))
E socket.gaierror: [Errno 8] nodename nor servname provided, or not known
____________________________________________________________________________________________________________________________________ ERROR collecting tests/unittests/core/test_insert.py ____________________________________________________________________________________________________________________________________
tests/unittests/core/test_insert.py:8: in <module>
from orion.core.cli.insert import _validate_input_value
src/orion/core/cli/__init__.py:15: in <module>
from orion.core.cli import resolve_config
src/orion/core/cli/resolve_config.py:80: in <module>
('ORION_DB_ADDRESS', 'host', socket.gethostbyname(socket.gethostname()))
E socket.gaierror: [Errno 8] nodename nor servname provided, or not known
``` on my terminal.
I'm running Mac OS X 10.13.3.
When building a new external repo for algorithm plugins to metaopt, most of the content is boiler-plate. It should be easy to make an automatic script to generate new repos.
Example based on metaopt-skopt-bayes
name='metaopt.algo.skopt.bayes',
author='Xavier Bouthillier',
author_email='[email protected]',
packages=['metaopt.algo.skopt'],
'skopt_bayes = metaopt.algo.skopt.bayes:BayesianOptimizer'
install_requires=['metaopt.core', 'scikit-optimize>=0.5.1'],
The script would prompt the following requests to build the lines I described above.
package name:
author name:
author email:
packages (separated with commas) [default='metaopt.algo']:
install requirements (separated with commas) [default='metaopt.core']:
algorithm class name:
algorithm's module path [default='src/metaopt/algo/']:
To create a template like the repository example, one would provide this to the script:
package name: metaopt.algo.skopt.bayes
author name: Xavier Bouthillier
author email: [email protected]
packages (separated with commas) [default='metaopt.algo']: metaopt.algo.skopt
install requirements (separated with commas) [default='metaopt.core']: metaopt.core, scikit-optimize>=0.5.1
algorithm class name: BayesianOptimizer
algorithm's module: metaopt.algo.skopt.bayes
This would create a functional repo including LICENSE , MANIFEST.in, setup.cfg, setup.py and a file src/metaopt/algo/skopt/bayes.py, containing a skeleton algorithm class definition named BayesianOptimizer. Note that for simplicity The entry point would be bayesianoptimizer = metaopt.algo.skopt.bayes:BayesianOptimizer
, based on algorithm's class name.
The user would then obviously be free to edit those files as desired.
This test fails. scipy.stats.distributions
seems not to have the needed implementation to support returning the correct interval (lower and upper bound) of integer distributions like randint
or poisson
.
For example:
>>> from scipy.stats import distributions as dists
>>> dists.poisson.interval(alpha=1.0, mu=1.0)
(-1.0, inf)
, while it should return:
(0.0, inf)
The problem is even more complex, if you add a displacement (through loc
argument) or a scale (through scale
argument) to the distribution.
I refer from @bouthilx concerns pointed out on #21.
Algorithms should be provided with an interface to give out their own worker termination criteria.
So far, there is only a criterion from Experiment
class, i.e. Experiment.is_done
, which checks whether
the number of completed trials exceeds Experiment.max_trials
(which comes from configuration).
What else could we expect to have here that make sense in a distributed and asynchronous setting?
Discussion for 13/12/2017 meeting.
@nouiz @lamblin @dendisuhubdy
Useless ^_^
We have submit_time
, start_time
and end_time
, but that doesn't tell how long a trial have been executed if there was interruptions/suspensions in its lifetime.
Context: #36 (comment)
I believe unit-tests should not be seeded. Some tests pass for some numerical values and fails for others, if we seed we may only test the passing values. We can't test all possible values, but having unit-tests running on different values at each run is good in my opinion, it helps covering more.
We would like our design to be aimed at satisfying the following 3 general requirements with descending priority order:
So, the general goal, if I am thinking this right, is to decouple the training procedure itself from the hyperoptimization procedure.
Please complete any other thing missing from the aforementioned and comment on any of those. We have discussed whether it is possible to reuse some of the other hyperoptimization frameworks available, e.g. hyperopt.
So we had a use case where we needed to save the Tensorboard logs and model checkpoints inside a folder where we control the naming of it. Like
exp_id = uuid.uuid4().__str__()
Currently, if I'm not wrong it's set here automaticallyhttps://github.com/mila-udem/orion/blob/6219add391aec50b1714a2d6f5910c5f6530310e/src/orion/core/worker/experiment.py#L124
How can we pass it via the command line interface instead?
Copying and moving discussion about metaopt.algo.space._Discrete
class from #36 here.
I believe _Discrete class is useless. The sample method could solely reside inside Integer class. Categorical's sample method is straightforward using numpy.random.choice. Also, interval method is overwritten anyway by Categorical's to raise RuntimeError.
Although I could easily see how Integer and Categorical are related in theory, in our context combining them is not giving any advantage. It complexifies Categorical without giving any advantage in return.
๐
However, I found a difficulty removing the
_Discrete
subclass. Remember that_Discrete
class was introduced as an alternative to remove discrete keyword argument from Dimension. So the programming flow currently is as follows, and it takes advantage of Python mechanics regarding class inheritance:Currently:
Suppose a call to Integer(...).sample(...). In general calling a method in an Integer object will first search within Integer and then super calls from there will try to find corresponding methods first in Real, then _Discrete and then Dimension.class Integer(Real, _Discrete): def interval(self, alpha=1.0): low, high = super(Integer, self).interval(alpha) # Call to Real.interval # make them integers return int_low, int_high def __contains__(self, point): # Check if they are integers... return super(Integer, self).__contains__(point) # Call to Dimension.__contains__ # calling Integer(...).sample(...) will enter first Real.sample; # due to the order of subclasses class Real(Dimension): def interval(self, alpha=1.0): prior_low, prior_high = super(Real, self).interval(alpha) # Call to Dimension.interval return (max(prior_low, self._low), min(prior_high, self._high)) # Add constraints to interval def sample(self, n_samples=1, seed=None): # Checks if constraint are met! samples = [] for _ in range(n_samples): for _ in range(4): sample = super(Real, self).sample(1, seed) # Calls _Discrete.sample if sample[0] not in self: # Calls Integer.__contains__ # ... # ... # ... return samples class _Discrete(Dimension): def sample(self, n_samples=1, seed=None): samples = super(_Discrete, self).sample(n_samples, seed) # Calls Dimension.sample # Making discrete by ourselves because scipy does not use **floor** return list(map(lambda x: numpy.floor(x).astype(int), samples)) # Converts to integersSo the sequence of calls goes like this:
- [
Integer.sample
(redirects)]
Real.sample
(fulfills extra constraints)
_Discrete.sample
(converts to integers)
Dimension.sample
(samples from distribution)Integer.__contains__
(checks if they are integers)
Dimension.__contains__
(check if they are within bounds)
Integer.interval
(makes it int)
Real.interval
(extra constraints)
Dimension.interval
(asks the distribution for its support)If we remove _Discrete and move all methods in class Integer(Real). The flow will be become:
So the sequence of calls goes like this:
Integer.sample
(converts to integers)
Real.sample
(fulfills extra constraints)
Dimension.sample
(samples from distribution) -> this possibly draws samples from real distributionInteger.__contains__
(checks if they are integers) -> so in that case it will always fail to be an int
Dimension.__contains__
(check if they are within bounds)
Integer.interval
(makes it int)
Real.interval
(extra constraints)
Dimension.interval
(asks the distribution for its support)Poor Dimension.sample will return whatever the scipy distribution returns, and the check from the Integer.contains will fail, if the underlying distribution in scipy corresponds to a real one... However, we should allow (I believe) discretizing real distributions, so that for instance someone could have approximately normal distributed integer stochastic variables. What do you think about this?
I agree with the idea of discretizing Real distributions. However I don't see the problem here, can't we just move _Discrete.sample() inside Integer.sample() rather than completely removing this method and rely solely on Real.sample()?
My proposition then would be to replace this confusing inheritance path with a simple separation of sample() and _sample() where sample makes verifications over _sample() which do it blindly.
So this line:
sample = super(Real, self).sample(1, seed)
becomes
sample = self._sample(1, seed)
where
class Real(Dimension): # ... def _sample(self, n_samples, seed=None): return super(Real, self).sample(1, seed) # ... class Integer(Dimension): # ... def _sample(self, n_samples, seed=None): samples = super(Integer, self)._sample(n_samples, seed) return list(map(lambda x: numpy.floor(x).astype(int), samples)) # ...I think it is easier this way to track the path of execution in order to understand, modify and maintain the code.
We would now have:
Integer.sample
(converts to integers)
Real.sample
(fulfills extra constraints)
Integer._sample
Real._sample
Dimension._sample
(samples from distribution)Integer.__contains__
Dimension.__contains__
(check if they are within bounds)
Integer.interval
(makes it int)
Real.interval
(extra constraints)
Dimension.interval
(asks the distribution for its support)
So the argument here is whether this will be implemented by ordered multiple inheritance (super
& dynamic stuff supported by Python) or by a wrapper function such as _sample
. @bouthilx dislikes the first because of the diamond relationship, I dislike the second because it two different signatures serving the same functionality.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.