Giter Club home page Giter Club logo

Comments (2)

RickardSjogren avatar RickardSjogren commented on September 15, 2024

Hi @bgriffen, thank you for your interest and well-described questions! I will try to answer your questions below.

  1. If I remember correctly the algorithm "forgets" previous runs. Since it uses least-squares modelling there is a risk that the model will be biased towards the region of data space where there are more observations, i.e. in the direction from where previous iterations where run. The purpose of statistical experimental designs is to eliminate bias and provide good support for the desired type of model.
  2. The algorithm returns the best experiment that has been actually run. The update between iterations are done based on the "interpolated" predictions. The reason is that the curvature of the response surface probably does not perfectly match the model surface, meaning that we return the run that we have confirmed is the best.
  3. Your model is a simple linear model that approximates the Gaussian surface very poorly since it's very curved. The "fullfactorial2levels" design does not support quadratic models. So my first choice would be to try a response surface design that allows for quadratic models (e.g. "ccc"). Hopefully that will improve the convergence of the algorithm in this toy example also.
  4. Hopefully is this problem solved by my suggestion in 3. We have used this algorithm in-house for many different real-world applications for a number of years, and find that it converges well.
  5. To use OLS to model the response and find a predicted optimum predict_optimum is used. The limits of the design are updated based on the predicted optimum in update_factors_from_optimum. _new_optimization_design outputs a design based on the current state of the factors.

I hope this answers your questions.

from doepipeline.

bgriffen avatar bgriffen commented on September 15, 2024

Hi Richard -- thank you for the clarification. My understanding improves each day. Just a few additional comments/questions below if it's OK.

  1. The algorithm returns the best experiment that has been actually run. The update between iterations are done based on the "interpolated" predictions. The reason is that the curvature of the response surface probably does not perfectly match the model surface, meaning that we return the run that we have confirmed is the best.

Indeed, I did think I was overstepping the mark with my multivariate normal distribution. Though, with a sufficiently large variance it should be easier to model with a fullfactorial2levels model. Indeed, the surface optimum predicted is 0.01562 which is rather close to the actual optimum of 0.01592. Though the values of A and B predicted however are quite out - A ~ 71 (actual = 60) and B ~ 90.5 (actual 75). Though again, ff2level might not cut it, even for large variances in on the distribution (updated below).

response

  1. Your model is a simple linear model that approximates the Gaussian surface very poorly since it's very curved. The "fullfactorial2levels" design does not support quadratic models. So my first choice would be to try a response surface design that allows for quadratic models (e.g. "ccc"). Hopefully that will improve the convergence of the algorithm in this toy example also.

OK, I'll experiment with the other designs. The trouble I face is using a design which can be implemented physically vs. what the response surface function may be (unknown a priori). I need to actually perform these experiments and physically, not in-silico so two level ff or three level ff is easier to arrange but may not be the best given ccc allows quadratic responses to be well described.

On that though, if I have historic data that doesn't fit the current design, e.g. its a sparse sampling, can that data still be used to initialize the search? Or do you have to set up the whole pipeline "clean" and carry out the experimental designs as generated by fullfactorial2levels and attach their responses once measured? Additionally, once say I perform the fullfactorial2level runs, can I then change the design mid-way through optimization to get a better functional fit/prediction of optima.

  1. To use OLS to model the response and find a predicted optimum predict_optimum is used. The limits of the design are updated based on the predicted optimum in update_factors_from_optimum. _new_optimization_design outputs a design based on the current state of the factors.

I guess what I poorly asked, was that I need to run a predict_optimum step first in order for the OLS model to be used to generate new experiments.

Just to double check -- my order of doing things is correct? I added a comment to lines under question below

exp = ExperimentDesigner(factors,
                        'ccc',
                        responses,
                        model_selection='greedy',
                        skip_screening=True,
                        shrinkage=0.9)

df = exp.new_design()

# single iteration e.g.
for niters in range(number_of_iterations):
    r_0 = generate_response(df,rv)
    fractioni = pd.DataFrame.from_dict({"fraction":r_0})
    bi = exp.get_best_experiment(df,fractioni) # get the best experiment of the run
    fi = exp.update_factors_from_optimum(bi) # it is strange that it is named "from_optimum" but the function received the result from exp.get_best_experiment?
    if fi[1]: # if converged, stop generating new experiments
            break
    dfoptimal,model,prediction = exp.get_optimal_settings(fractioni) # slight modification of what is returned originally set in designer.py

    # generate a new design based on responses and best guess optima
    df = exp.new_design()

I'm just struggling to get the order of get_best_experiment, get_optimal_settings, update_factors_from_optimum and new_design correct given the internal variables that are updated? I'm just trying to get the logic of the functions in the correct order given a response measured at in each loop (though synthetically generated here).

Lastly, if a factor's weight is ~0 to the prediction, is there a way to remove it when new_design() is run.

Thank you for your help. Much appreciated.

from doepipeline.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.