Comments (3)
from mlos.
Looks to me like we are using this functionality already:
It's especially useful for resuming an experiment and re-warming the optimizer with prior results.
I think the rough idea was that for backend optimizers that supported it, we could amortize the cost of retraining the model.
That said, I don't think this is being done well atm, but let's leave that topic for a future PR.
(for instance, for SMAC, I think we probably want to throw out the existing backend optimizer and bulk register everything in one shot at its initialization whenever we do that)
Back to this change, which was really only meant for code readability improvements,
in part what you're proposing here is to change the API to return NamedTuples of individual (config, score, context, metadata) instances instead of DataFrames of the same. That is a much wider reaching change than I was originally envisioning, I think.
Your proposal of having a class called an Observation
is nice, but what I might do instead is something more like the following (quickly hacked together with some copilot help).
Note that we could pretty easily extend some of these helper functions for conversion to other data structure types in the future as well.
Some caveats:
The __iter__
method returns the wrong type info for scores
and configs
due to needing to support optional values for contexts
and metadatas
.
An alternative would be to explicitly change the received return type in all callsites, but that might be a bunch of work too.
There's probably some other refinements that could happen as well. As I said, this is just a quick stub idea.
"""Simple dataclass for storing observations from a hyperparameter optimization run."""
import itertools
from dataclasses import dataclass
from typing import List, Optional, Iterator
import pandas as pd
@dataclass(frozen=True)
class Observation:
"""Simple dataclass for storing a single Observation."""
config: pd.Series
score: pd.Series
context: Optional[pd.Series] = None
metadata: Optional[pd.Series] = None
def __iter__(self) -> Iterator[Optional[pd.Series]]:
"""A not quite type correct hack to allow existing code to use the Tuple style
return values.
"""
# Note: this should be a more effecient return type than using astuple()
# which makes deepcopies.
return iter((self.config, self.score, self.context, self.metadata))
@dataclass(frozen=True)
class Observations:
"""Simple dataclass for storing observations from a hyperparameter optimization
run.
"""
configs: pd.DataFrame
scores: pd.DataFrame
context: Optional[pd.DataFrame] = None
metadata: Optional[pd.DataFrame] = None
def __post_init__(self) -> None:
assert len(self.configs) == len(self.scores)
if self.context is not None:
assert len(self.configs) == len(self.context)
if self.metadata is not None:
assert len(self.configs) == len(self.metadata)
def __iter__(self) -> Iterator[Optional[pd.DataFrame]]:
"""A not quite type correct hack to allow existing code to use the Tuple style
return values.
"""
# Note: this should be a more effecient return type than using astuple()
# which makes deepcopies.
return iter((self.configs, self.scores, self.context, self.metadata))
def to_observation_list(self) -> List[Observation]:
"""Convert the Observations object to a list of Observation objects."""
return [
Observation(
config=config,
score=score,
context=context,
metadata=metadata,
)
for config, score, context, metadata in zip(
self.configs.iterrows(),
self.scores.iterrows(),
self.context.iterrows() if self.context is not None else itertools.repeat(None),
self.metadata.iterrows() if self.metadata is not None else itertools.repeat(None),
)
]
def get_observations() -> Observations:
"""Get some dummy observations."""
# Create some dummy data
configs = pd.DataFrame(
{
"x": [1, 2, 3],
"y": [4, 5, 6],
}
)
scores = pd.DataFrame(
{
"score": [0.1, 0.2, 0.3],
}
)
# Create an Observations object
return Observations(configs=configs, scores=scores)
def test_observations() -> None:
"""Test the Observations dataclass."""
# Create an Observations object
observations = get_observations()
observation = observations.to_observation_list()[0]
# Print the Observations object
print(observations)
print(observations.configs, observations.scores)
print(observation)
print(observation.config, observation.score)
# Or in tuple form using the __iter__ method:
configs, scores, contexts, metadatas = get_observations()
print(configs)
print(scores)
print(contexts)
print(metadatas)
config, score, context, metadata = get_observations().to_observation_list()[0]
print(config)
print(score)
print(context)
print(metadata)
if __name__ == "__main__":
test_observations()
from mlos.
And tbh, that __iter__
hack kinda defeats the purpose a bit of what we're trying to do with this change - avoid mistakes of reordering metadata, context
in return(ed) values by making them more explicit.
from mlos.
Related Issues (20)
- Add tests for trial scheduling and time resolution granularity
- Storing `is_defaults` causing storage column length issues when too many tunables
- Investigate test breakage with recent asyncssh version
- Resume experiments fail when trying to bulk register previous metrics to the optimizer
- Mlos parallelization: storage schema initialization fails when attempted by many processes concurrently HOT 1
- mlos_bench: if a result.___ value is -1 register trial as FAILED HOT 2
- mlos_bench: add a flag for "deploy if not exists" for Azure ARM templates
- Implement quantization for ConfigSpace >= 1.0
- LlamaTune inverse_trainsform() produces invalid configuration HOT 1
- mlos_viz: add PCA API
- Add quantized tunable + llamatune tests
- Debug logging hiding real Azure Compute API errors in AzureDeploymentService HOT 2
- Fix issues in AzureFileShareService introduced by #779 HOT 1
- Update mlos_viz to support matplotlib >= 3.9.* HOT 1
- bulk_register can fail when existing trials are all Failed and do not have result columns yet
- Enable Python 3.12 for Linux CI HOT 6
- mlos_webui design requirements HOT 4
- Latest version of FLAML changes optimization behavior
- Multi-objective optimization configs no longer allow overrides via globals HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlos.