Currently, I am working on condensing the return values of suggest, and the arguments

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Split registering multiple configs into a seperate function about mlos HOT 3 OPEN

jsfreischuetz commented on August 21, 2024

Split registering multiple configs into a seperate function

from mlos.

Comments (3)

jsfreischuetz commented on August 21, 2024

@bpkroth @motus

from mlos.

bpkroth commented on August 21, 2024

Looks to me like we are using this functionality already:

MLOS/mlos_bench/mlos_bench/optimizers/mlos_core_optimizer.py

Line 120 in bb6c6f2

 self._opt.register(configs=df_configs, scores=df_scores[opt_targets].astype(float)) 

It's especially useful for resuming an experiment and re-warming the optimizer with prior results.
I think the rough idea was that for backend optimizers that supported it, we could amortize the cost of retraining the model.
That said, I don't think this is being done well atm, but let's leave that topic for a future PR.
(for instance, for SMAC, I think we probably want to throw out the existing backend optimizer and bulk register everything in one shot at its initialization whenever we do that)

Back to this change, which was really only meant for code readability improvements,
in part what you're proposing here is to change the API to return NamedTuples of individual (config, score, context, metadata) instances instead of DataFrames of the same. That is a much wider reaching change than I was originally envisioning, I think.

Your proposal of having a class called an Observation is nice, but what I might do instead is something more like the following (quickly hacked together with some copilot help).

Note that we could pretty easily extend some of these helper functions for conversion to other data structure types in the future as well.

Some caveats:

The __iter__ method returns the wrong type info for scores and configs due to needing to support optional values for contexts and metadatas.

An alternative would be to explicitly change the received return type in all callsites, but that might be a bunch of work too.

There's probably some other refinements that could happen as well. As I said, this is just a quick stub idea.

"""Simple dataclass for storing observations from a hyperparameter optimization run."""

import itertools
from dataclasses import dataclass
from typing import List, Optional, Iterator

import pandas as pd


@dataclass(frozen=True)
class Observation:
    """Simple dataclass for storing a single Observation."""

    config: pd.Series
    score: pd.Series
    context: Optional[pd.Series] = None
    metadata: Optional[pd.Series] = None

    def __iter__(self) -> Iterator[Optional[pd.Series]]:
        """A not quite type correct hack to allow existing code to use the Tuple style
        return values.
        """
        # Note: this should be a more effecient return type than using astuple()
        # which makes deepcopies.
        return iter((self.config, self.score, self.context, self.metadata))


@dataclass(frozen=True)
class Observations:
    """Simple dataclass for storing observations from a hyperparameter optimization
    run.
    """

    configs: pd.DataFrame
    scores: pd.DataFrame
    context: Optional[pd.DataFrame] = None
    metadata: Optional[pd.DataFrame] = None

    def __post_init__(self) -> None:
        assert len(self.configs) == len(self.scores)
        if self.context is not None:
            assert len(self.configs) == len(self.context)
        if self.metadata is not None:
            assert len(self.configs) == len(self.metadata)

    def __iter__(self) -> Iterator[Optional[pd.DataFrame]]:
        """A not quite type correct hack to allow existing code to use the Tuple style
        return values.
        """
        # Note: this should be a more effecient return type than using astuple()
        # which makes deepcopies.
        return iter((self.configs, self.scores, self.context, self.metadata))

    def to_observation_list(self) -> List[Observation]:
        """Convert the Observations object to a list of Observation objects."""
        return [
            Observation(
                config=config,
                score=score,
                context=context,
                metadata=metadata,
            )
            for config, score, context, metadata in zip(
                self.configs.iterrows(),
                self.scores.iterrows(),
                self.context.iterrows() if self.context is not None else itertools.repeat(None),
                self.metadata.iterrows() if self.metadata is not None else itertools.repeat(None),
            )
        ]


def get_observations() -> Observations:
    """Get some dummy observations."""
    # Create some dummy data
    configs = pd.DataFrame(
        {
            "x": [1, 2, 3],
            "y": [4, 5, 6],
        }
    )
    scores = pd.DataFrame(
        {
            "score": [0.1, 0.2, 0.3],
        }
    )

    # Create an Observations object
    return Observations(configs=configs, scores=scores)


def test_observations() -> None:
    """Test the Observations dataclass."""
    # Create an Observations object
    observations = get_observations()
    observation = observations.to_observation_list()[0]

    # Print the Observations object
    print(observations)
    print(observations.configs, observations.scores)
    print(observation)
    print(observation.config, observation.score)

    # Or in tuple form using the __iter__ method:
    configs, scores, contexts, metadatas = get_observations()
    print(configs)
    print(scores)
    print(contexts)
    print(metadatas)

    config, score, context, metadata = get_observations().to_observation_list()[0]
    print(config)
    print(score)
    print(context)
    print(metadata)


if __name__ == "__main__":
    test_observations()

from mlos.

bpkroth commented on August 21, 2024

And tbh, that __iter__ hack kinda defeats the purpose a bit of what we're trying to do with this change - avoid mistakes of reordering metadata, context in return(ed) values by making them more explicit.

from mlos.

Split registering multiple configs into a seperate function about mlos HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent