I have been using MOE to tune parameters of a black box function. One thing I haven't

Answering your q's: All the JSON responses in m

Ok. There should be a msg like that returned by the next_points call too. <

Thank you for helping out <a class="user-mention notranslate" data-hovercard-type="use

Unknown Source of Variation in `gp_next_points()`,about yelp/moe

Comments (10)

suntzu86 commented on August 18, 2024

Sorry I didn't respond sooner... just started a new job so I've been pretty busy. Plus I don't have a working linux installation at the moment to test on, lol.

I believe Branin should work b/c we have tested with it before. Although while it is a very commonly considered case in academic papers, it never seemed particularly similar to our present MOE applications so we stopped focusing on it.

So this should work. That said, a few things stand out to me...

All the http endpoints return status messages (like indicating whether the solver converged/found an update). Are these coming back as true? (If not, we just return a random value or best guess but it is not trustworthy.)
You don't pass a hyperparameter_domain to gp_hyper_opt. I'm actually really surprised this doesn't fail. Could you tell me what the result of this line is in your run?
https://github.com/Yelp/MOE/blob/master/moe/views/rest/gp_hyper_opt.py#L110

I'm not entirely sure what domain it thinks it's getting. My only guess could be that the domain is coming out as [1.0, 1.0] for all hyperparameters. (Because the C++ accepts the domain in log10, so after exponentiating, 10 ** 0 = 1.) But I really don't know... that should have failed validation, sorry :(
3. Following 2, can you print out the covariance hyperparams you're getting? Are they all 1.0?

Also, I didn't build or interact with the easyinterface stuff like at all, so please bear with me.

Lastly, you mean that doing somethign like this:
https://github.com/Yelp/MOE/blob/master/moe_examples/next_point_via_simple_endpoint.py
you were not able to update the Experiment object from easyinterface? That doesn't make any sense b/c that previous example is one of our integration tests which are all passing in the most recent revision.

from moe.

dgreis commented on August 18, 2024

Answering your q's:

All the JSON responses in my docker console have messages like "status" : "optimizer_success", so it seems the solver is working...
I'm running this from ipython notebook. My kernel always dies when I run:
from moe.views.rest import gp_hyper_opt
So I'm using
from moe.easy_interface.simple_endpoint import gp_hyper_opt
instead.

I still tried adding a print statement to moe/views/rest/gp_hyper_opt to get the value of hyperparameter_domain but the call from simple_endpoint doesn't seem to run thru that file. Or maybe I'm doing something wrong...

Nonetheless, I refactored the cycle_moe() and grab_points_sampled(exp) functions as follows, to include the hyperparameter_domain_info:

def cycle_moe(exp, R=50):
    rdf = pd.DataFrame({})
    tdf = pd.DataFrame({})
    for i in range(R):
        bounds = get_bounds(exp)
        if (i % 5 == 0):
            points_sampled = grab_pts_sampled(exp)
            t0 = time.time()
            gp_dim = len(points_sampled[0][0])
            hyper_dim = gp_dim + 1
            covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.59.103',**{'hyperparameter_domain_info' : { \
                'dim': hyper_dim, \
                'domain_bounds': [{'min': 0.1, 'max': 2.0}] * hyper_dim  \
                }})
            print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
            tdf[str(0)] = [time.time() - t0]
        next_point = gp_next_points(exp,rest_host='192.168.59.103',**{'covariance_info':covariance_info})[0]
        f_next = branin(next_point) #+  npr.normal(scale=std)
        rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next,
                         'cov_hyps': covariance_info['hyperparameters']},ignore_index=True)

        #points = exp.historical_data.to_list_of_sample_points()
        new_point = SamplePoint(next_point, f_next, 0)
        #exp = Experiment(bounds)
        #points.append(new_point)
        #exp.historical_data.append_sample_points(points)
        exp.historical_data.append_sample_points([new_point])
    return rdf[['x1','x2','f','cov_hyps']],tdf,exp,covariance_info

def grab_pts_sampled(exp):
    pts = exp.historical_data.points_sampled
    f_val = exp.historical_data.points_sampled_value
    f_var = exp.historical_data.points_sampled_noise_variance
    return [SamplePoint(list(pts[i,:]),f_val[i],f_var[i]) for i in range(len(pts))] #Changed this line

As a side note, it seems like I can actually add to experiments from the easy interface, so it doesn't seem like that is actually an issue anymore.

When I cycle this 50 times with 5 training points to start (just like I did in my first post), the hyperparameter values stick around [2.0, 2.0, 2.0] in rdf. When I ran this for 100 iterations, some of the values did depart from 2.0, but the overall values are still not converging to any of the global minima.

from moe.

suntzu86 commented on August 18, 2024

optimizer_success: True is what you're looking for. That field is the name of a boolean. Both hyper_opt and next_points produce something along those lines.
Your kernel dies...? That seems problematic. Like the linux kernel dies? Any error msgs or logging info or whatnot before death? I've never seen that happen before.

Your code edit needs to happen on the server (like if you're running in docker or something, it needs to be in the code running there; so relaunch after edits or log in to docker). easy_interface is just a wrapper around urllib2 to send POST and GET requests; it doesn't do any compute.

But that function is definitely being called. It's the only "view" that handles hyperparameter optimization.

Just to be clear, the hyperparameter values won't be related to the minima of the branin fcn. Unless you have a reference for like, what the ideal hyperparams for branin should be. I'd set the hyperparam domain to be something like [[0.01, 3], [0.03, 15], [0.03, 15]]. max length scale of 2 is pretty small for a domain with length 15.
You're running the C++ right? Not forcing the stuff in python_version to run somehow?
what does the scatter plot of points explored look like?

from moe.

dgreis commented on August 18, 2024

optimizer_success: True is what you're looking for. That field is the name of a boolean. Both hyper_opt and next_points produce something along those lines.

I see things like:

'status': {'optimizer_success': {'log_marginal_likelihood_newton_found_update': True}

Your kernel dies...? That seems problematic. Like the linux kernel dies? Any error msgs or logging info or whatnot before death? I've never seen that happen before.

Not linux kernel. iPython Kernel. I don't think this is important because this isn't even where the code is executing; it's executing on Docker, as you pointed out.

Your code edit needs to happen on the server (like if you're running in docker or something, it needs to be in the code running there; so relaunch after edits or log in to docker). easy_interface is just a wrapper around urllib2 to send POST and GET requests; it doesn't do any compute.

I spent hours trying to print the value of hyperparamter_domain in views/rest/gp_hyper_opt.py (print to stdout, pass to json output, write to file) but all of them caused the build to fail tests and resulted in a bad (500 error) request answer from Docker. I'm out of ideas to debug this. Is knowing this absolutely necessary?

(as a side note, #401 turned out to be a blocker to do code edits easily. I commented there too)

Just to be clear, the hyperparameter values won't be related to the minima of the branin fcn. Unless you have a reference for like, what the ideal hyperparams for branin should be. I'd set the hyperparam domain to be something like [[0.01, 3], [0.03, 15], [0.03, 15]]. max length scale of 2 is pretty small for a domain with length 15.

When I said, "the overall values are still not converging to any of the global minima," I meant the overall values of the branin function.

I changed the line in my cycle_moe() from

'domain_bounds': [{'min': 0.1, 'max': 2.0}] * hyper_dim

to:

'domain_bounds': [{'min': 0.01, 'max': 3.0},{'min': 0.03, 'max': 15},{'min': 0.03, 'max': 15}]

This didn't seem to have any impact.

You're running the C++ right? Not forcing the stuff in python_version to run somehow?

I'm not sure how to verify whether I'm doing this or not, but I haven't explicitly or intentionally done this I don't think...

what does the scatter plot of points explored look like?

This is 50 iterations. For reference, the global optima of the branin are at: (-pi, 12.275), (pi, 2.275) and (9.42478, 2.475). It doesn't look like MOE is getting close to finding any of them. It's hard to understand the sequence of points explored in a scatter plot, so I tried to color the points so the earlier ones would be more red and the later ones more blue. If you run cycle_moe(), it's much easier to see the progression just by looking at the columns for X1 and X2 in rdf

from moe.

suntzu86 commented on August 18, 2024

Ok. There should be a msg like that returned by the next_points call too.
Still weird. Idk why it would cause ipython to freeze. I don't use docker (compile, build, and run locally) but I have no guesses there.
The hyperparam domain should be passed correctly esp now that you're specifying it explicitly. Don't worry about this anymore. I just wasn't sure what the default value (when you didn't specify it) was. Sorry that has been such a pain for you :( I didn't really connect the dots there... been on antibiotics for the last week and my brain is wonky. Dev'ing on docker is a huge pain; I avoid it. You would probably need to turn off line 35 of the dockerfile to prevent test failure and I'm not sure where print statements to stdout go inside of docker, probably some configurable log.
#401 sorry again about the lack of docker guidance & it being painful. I know very little about it. I thought that was already handled previously. Original intent was for only maintainers to mess with it and have people build locally if they were doing dev stuff, I think?
No impact... ok. I haven't run locally yet. Got my stuff setup this weekend but I upgraded ubuntu too far and now all kinds of tests are failing on gcc 5. Working on it...
Again brainfart. If you're in docker then the C++ is running.
Scatter plot: :( :( Blah. Could you give me one more piece of debug info? After each result for next_points, can you call gp_ei for the point that MOE suggests as well as the true minima? (e.g., to check if EI optimization even has a hope of finding the it.) If it's too much trouble no worries, I'll look into it anyway.

Steps if interested: Copy the gp_mean_var function in easy_interface to like gp_ei. Change GP_MEAN_VAR_ROUTE_NAME to GP_EI_ROUTE_NAME. Change the return ... line to return output.get('expected_improvement'). And points_to_evaluate is a list of lists, e.g., [[0.3, 0.2], [0.5, 0.7], [-3.14, 12.25]]. The schema for gp_ei is here: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_ei.py compared to mean_var:
https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_mean_var.py
The points_sampled input is just like in hyper_opt. You need to specify covariance_info as a kwarg just like in next_points.

Anyway, sorry I don't have a working install yet but I'll get there. Should have looked into gcc default version before upgrading to ubuntu 15.10.
Speaking of which, can you tell me which version of gcc and which distro/version of linux you're on? (edit: although probably irrelevant b/c the docker image should be independent of those things)

from moe.

suntzu86 commented on August 18, 2024

Also does this guy in moe_examples folder work for you? Basically does the same thing as your script, I think...

https://github.com/Yelp/MOE/blob/master/moe_examples/combined_example.py

may want to disable the noise in function_to_minimize. Looks like the test for the files in moe_examples don't verify correctness (just that they run).

from moe.

dgreis commented on August 18, 2024

Scatter plot: :( :( Blah. Could you give me one more piece of debug info? After each result for next_points, can you call gp_ei for the point that MOE suggests as well as the true minima? (e.g., to check if EI optimization even has a hope of finding the it.) If it's too much trouble no worries, I'll look into it anyway.

There are multiple minima (all equal to 0.39) in the branin function see desc, so this is not entirely straightforward

Steps if interested: Copy the gp_mean_var function in easy_interface to like gp_ei. Change GP_MEAN_VAR_ROUTE_NAME to GP_EI_ROUTE_NAME. Change the return ... line to return output.get('expected_improvement'). And points_to_evaluate is a list of lists, e.g., [[0.3, 0.2], [0.5, 0.7], [-3.14, 12.25]]. The schema for gp_ei is here: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_ei.py compared to mean_var:
https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_mean_var.py
The points_sampled input is just like in hyper_opt. You need to specify covariance_info as a kwarg just like in next_points.

These are "steps"... to what? I didn't understand what this was for, but I did it anyway. I couldn't add gp_mean_var to views.rest.gp_ei.py because of the same iPython-kernel-dying issue as before (let's leave that issue aside), but I was able to get it to work by keeping it in simple_endpoint.py. I had to make a few more changes to the file, and I also changed the name so it wouldn't clash with the existing gp_mean_var. Renamed to my_grab_ei, it looks like this:

from moe.views.schemas.rest.gp_ei import GpEiResponse #Added
from moe.views.constant import GP_EI_ROUTE_NAME, GP_EI_PRETTY_ROUTE_NAME #Added

def my_grab_ei(
        points_sampled,
        points_to_evaluate,
        rest_host=DEFAULT_HOST,
        rest_port=DEFAULT_PORT,
        testapp=None,
        **kwargs
):
    """Hit the rest endpoint for calculating the posterior mean and variance of a gaussian process, given points already sampled."""
    endpoint = ALL_REST_ROUTES_ROUTE_NAME_TO_ENDPOINT[GP_EI_ROUTE_NAME] ##Changed
    raw_payload = kwargs.copy()  # Any options can be set via the kwargs ('covariance_info' etc.)

    raw_payload['points_to_evaluate'] = points_to_evaluate

    # Sanitize input points
    points_sampled_clean = [SamplePoint._make(point) for point in points_sampled]
    historical_data = HistoricalData(
            len(points_to_evaluate[0]),  # The dim of the space
            sample_points=points_sampled_clean,
            )

    if 'gp_historical_info' not in raw_payload:
        raw_payload['gp_historical_info'] = historical_data.json_payload()

    if 'domain_info' not in raw_payload:
        raw_payload['domain_info'] = {'dim': len(points_to_evaluate[0])}

    json_payload = json.dumps(raw_payload)

    json_response = call_endpoint_with_payload(rest_host, rest_port, endpoint, json_payload, testapp)

    output = GpEiResponse().deserialize(json_response) ##Changed

    return output.get('expected_improvement') ##Changed

Then I added this to my cycle_moe function, which now looks like this:

def cycle_moe(exp, R=50):
    rdf = pd.DataFrame({})
    tdf = pd.DataFrame({})
    for i in range(R):
        bounds = get_bounds(exp)
        if (i % 5 == 0):
            points_sampled = grab_pts_sampled(exp)
            t0 = time.time()
            gp_dim = len(points_sampled[0][0])
            hyper_dim = gp_dim + 1
            covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.99.100'
                ,**{'hyperparameter_domain_info' : { \
                'dim': hyper_dim, \
                'domain_bounds': [{'min': 0.01, 'max': 3.0},{'min': 0.03, 'max': 15},{'min': 0.03, 'max': 15}]  \
                #'domain_bounds': [{'min': 0.01, 'max': 2.0}] * hyper_dim                                                                                                      
                }}
                                          )
            print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
            tdf[str(0)] = [time.time() - t0]
        next_point = gp_next_points(exp,rest_host='192.168.99.100',**{'covariance_info':covariance_info})[0]
        f_next = branin(next_point) # +  npr.normal(scale=std)
        points_sampled = grab_pts_sampled(exp)  ##Added
        ei = my_grab_ei(points_sampled, [next_point],rest_host='192.168.99.100',**{'covariance_info':covariance_info})[0] ##Added
        rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next, 'ei': ei,   ##Modified
                         'cov_hyps': covariance_info['hyperparameters']},ignore_index=True)
        #points = exp.historical_data.to_list_of_sample_points()
        new_point = SamplePoint(next_point, f_next, 0)
        exp = Experiment(bounds)
        #points.append(new_point)
        #exp.historical_data.append_sample_points(points)
        exp.historical_data.append_sample_points([new_point])
    return rdf[['x1','x2','f','ei','cov_hyps']],tdf,exp,covariance_info ##Modified

I don't know how best to summarize this. It will probably be best to run it yourself and look at the contents of rdf, but here is the output from rdf.ei.describe()

count     50.000000
mean      47.673862
std       48.591592
min        2.522790
25%       10.211796
50%       22.794516
75%       85.673810
max      188.925347

Speaking of which, can you tell me which version of gcc and which distro/version of linux you're on? (edit: although probably irrelevant b/c the docker image should be independent of those things)

I'm on OS X Yosemite (10.10.5 ) and gcc --version returns 4.9.2. But I think you're right re: Docker handling this stuff...

Also, I got moe_examples to work. The output is below. I guess it is converging to [1,2.6]? But still, I need to make sure it converges in all cases, not just this one.

If I can't turn the corner on this problem soon, I think I'm gonna have to abandon MOE :(

Sampled f([1.88891236067, 4.0]) = 3.024281515239389817E-01
Sampled f([0.0, 2.9618376054]) = -9.838875168897800449E-01
Sampled f([0.0, 3.82570246393]) = -7.749819391510606170E-01
Sampled f([0.786838585499, 2.62777516463]) = -1.579648346558538918E+00
Sampled f([1.20312582365, 2.08982756487]) = -1.451454321864257269E+00
Updated covariance_info with {'hyperparameters': [0.823360900509, 1.01663242597, 1.28659618664], 'covariance_type': u'square_exponential'}
Sampled f([0.758268089974, 2.2331219232]) = -1.411622052413112893E+00
Sampled f([1.19599259406, 2.57708248725]) = -1.593354050232974606E+00
Sampled f([1.03968887999, 2.56816139896]) = -1.617582421292888650E+00
Sampled f([2.0, 2.24190928674]) = -1.018767759754114710E+00
Sampled f([1.05579477819, 2.56319110027]) = -1.616923815055614444E+00
Updated covariance_info with {'hyperparameters': [0.789039406073, 1.23435905217, 1.30371483948], 'covariance_type': u'square_exponential'}
Sampled f([1.0082757694, 2.62978522016]) = -1.616789052180044095E+00
Sampled f([2.0, 0.639342249258]) = -1.468008111448600994E-01
Sampled f([1.07167325368, 2.5252282002]) = -1.614562666452685313E+00
Sampled f([0.863802755471, 2.92560868142]) = -1.540054847282929629E+00
Sampled f([1.03653075905, 2.56537496547]) = -1.617587842392238073E+00
Updated covariance_info with {'hyperparameters': [0.737969968335, 1.24276395575, 1.37894109572], 'covariance_type': u'square_exponential'}
Sampled f([1.01903375152, 2.59415144623]) = -1.617994034119857982E+00
Sampled f([1.03422357833, 2.56392785481]) = -1.617583817670746216E+00
Sampled f([1.03150566466, 2.56489031089]) = -1.617640394284993066E+00
Sampled f([1.02916662686, 2.56574681541]) = -1.617681716433355454E+00
Sampled f([1.02711746293, 2.56651917111]) = -1.617712335919329059E+00
GP mean at (0, 0), (0.1, 0.1), ...: [0.98974570661, 0.963411209612, 0.911656230847, 0.835017255388, 0.735036207222, 0.614126994898, 0.475415337389, 0.322579331855, 0.159711048117, -0.00879206630745]

from moe.

suntzu86 commented on August 18, 2024

I can understand having to abandon MOE... I'd be annoyed too if I were stuck for so long :( I was hoping to spend some time on this over the weekend but I ended up having to spend the weekend working on things for my job.

So I have a couple more suggestions for you; I'll try to debug it myself this week as hopefully work has settled down. Anyway...

The point of the EI thing was to try and see if MOE's internal optimizer was failing to find good "next_points" choices OR if MOE's internal model was wrong. EI (expected improvement) is MOE's proxy for the real world--it optimizes that internally and "next_points" is the point of best EI. You can pass a list of lists into points_to_evaluate, so before updating historical_data, query EI with coordinate list [[next_point], [optima_1], [optima_2], [optima_3]]. And we could see if EI is better at next_point or at any of the true optima. This isn't critical though, it's more something for me to look at. Sorry about the confusion, but that's what I was thinking.

Quick things to try:

It's possible that the default optimizer settings for EI are not appropriate here. Try adding the following to your POST request (so pass it in as a kwarg to gp_next_points function, just like you pass in covariance_info now): {"optimizer_info": {"optimizer_type": "gradient_descent_optimizer", "num_multistarts": 2000, "optimizer_parameters": {"gamma": 0.7, "pre_mult": 1.2, "max_relative_change": 0.5, "tolerance": 1.0e-7, "max_num_steps": 1200}}}. Try "gamma": 0.5 too perhaps. (This may be a lot slower... if it's too slow, try decreasing num_multistarts. The value you're currently using is 400.)
It's possible that gradient descent is just sucking for this task (newton would be vastly better; working on getting that together). So let's try dumb search: {"optimizer_info": {"optimizer_type": "null_optimizer", "num_random_samples": 10000000}}. (You may have to put "optimizer_parameters": {} in there too but I don't think so.) 10 million is a bit of a guess; if that's too slow, decrease it. If it's still super fast, try increasing it.

Hopefully I spelled all those fields correctly, lol.

Edit:

if you give up on MOE, I'll still eventually fix this b/c this is a pretty common case. I'll keep you posted.
if you got moe_examples working (that does look like it is converging), can you try putting branin into that routine you ran? I don't think there's anything wrong w/your script but maybe let's verify.

from moe.

johnroach commented on August 18, 2024

Thank you for helping out @suntzu86 . Have you been able to work on this? I am having a similar problem.

from moe.

suntzu86 commented on August 18, 2024

Nope, I haven't had a chance to dive deeper into this.

A team at Cornell where some of the original research for MOE was done put a paper up on arxiv recently:
http://arxiv.org/abs/1602.05149

They run a branin case successfully, code here:
http://pastebin.com/2syieQKU

It's not going through the REST API though, but you can hopefully see how the python objects map into the api's json fields (like TensorProductDomain -> domain).

from moe.

Unknown Source of Variation in `gp_next_points()` about moe HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent