Comments (2)
Hi @bgriffen, thank you for your interest and well-described questions! I will try to answer your questions below.
- If I remember correctly the algorithm "forgets" previous runs. Since it uses least-squares modelling there is a risk that the model will be biased towards the region of data space where there are more observations, i.e. in the direction from where previous iterations where run. The purpose of statistical experimental designs is to eliminate bias and provide good support for the desired type of model.
- The algorithm returns the best experiment that has been actually run. The update between iterations are done based on the "interpolated" predictions. The reason is that the curvature of the response surface probably does not perfectly match the model surface, meaning that we return the run that we have confirmed is the best.
- Your model is a simple linear model that approximates the Gaussian surface very poorly since it's very curved. The
"fullfactorial2levels"
design does not support quadratic models. So my first choice would be to try a response surface design that allows for quadratic models (e.g."ccc"
). Hopefully that will improve the convergence of the algorithm in this toy example also. - Hopefully is this problem solved by my suggestion in 3. We have used this algorithm in-house for many different real-world applications for a number of years, and find that it converges well.
- To use OLS to model the response and find a predicted optimum
predict_optimum
is used. The limits of the design are updated based on the predicted optimum inupdate_factors_from_optimum
._new_optimization_design
outputs a design based on the current state of the factors.
I hope this answers your questions.
from doepipeline.
Hi Richard -- thank you for the clarification. My understanding improves each day. Just a few additional comments/questions below if it's OK.
- The algorithm returns the best experiment that has been actually run. The update between iterations are done based on the "interpolated" predictions. The reason is that the curvature of the response surface probably does not perfectly match the model surface, meaning that we return the run that we have confirmed is the best.
Indeed, I did think I was overstepping the mark with my multivariate normal distribution. Though, with a sufficiently large variance it should be easier to model with a fullfactorial2levels
model. Indeed, the surface optimum predicted is 0.01562
which is rather close to the actual optimum of 0.01592
. Though the values of A and B predicted however are quite out - A ~ 71 (actual = 60)
and B ~ 90.5 (actual 75)
. Though again, ff2level might not cut it, even for large variances in on the distribution (updated below).
- Your model is a simple linear model that approximates the Gaussian surface very poorly since it's very curved. The
"fullfactorial2levels"
design does not support quadratic models. So my first choice would be to try a response surface design that allows for quadratic models (e.g."ccc"
). Hopefully that will improve the convergence of the algorithm in this toy example also.
OK, I'll experiment with the other designs. The trouble I face is using a design which can be implemented physically vs. what the response surface function may be (unknown a priori). I need to actually perform these experiments and physically, not in-silico so two level ff or three level ff is easier to arrange but may not be the best given ccc allows quadratic responses to be well described.
On that though, if I have historic data that doesn't fit the current design, e.g. its a sparse sampling, can that data still be used to initialize the search? Or do you have to set up the whole pipeline "clean" and carry out the experimental designs as generated by fullfactorial2levels and attach their responses once measured? Additionally, once say I perform the fullfactorial2level runs, can I then change the design mid-way through optimization to get a better functional fit/prediction of optima.
- To use OLS to model the response and find a predicted optimum
predict_optimum
is used. The limits of the design are updated based on the predicted optimum inupdate_factors_from_optimum
._new_optimization_design
outputs a design based on the current state of the factors.
I guess what I poorly asked, was that I need to run a predict_optimum step first in order for the OLS model to be used to generate new experiments.
Just to double check -- my order of doing things is correct? I added a comment to lines under question below
exp = ExperimentDesigner(factors,
'ccc',
responses,
model_selection='greedy',
skip_screening=True,
shrinkage=0.9)
df = exp.new_design()
# single iteration e.g.
for niters in range(number_of_iterations):
r_0 = generate_response(df,rv)
fractioni = pd.DataFrame.from_dict({"fraction":r_0})
bi = exp.get_best_experiment(df,fractioni) # get the best experiment of the run
fi = exp.update_factors_from_optimum(bi) # it is strange that it is named "from_optimum" but the function received the result from exp.get_best_experiment?
if fi[1]: # if converged, stop generating new experiments
break
dfoptimal,model,prediction = exp.get_optimal_settings(fractioni) # slight modification of what is returned originally set in designer.py
# generate a new design based on responses and best guess optima
df = exp.new_design()
I'm just struggling to get the order of get_best_experiment
, get_optimal_settings
, update_factors_from_optimum
and new_design
correct given the internal variables that are updated? I'm just trying to get the logic of the functions in the correct order given a response measured at in each loop (though synthetically generated here).
Lastly, if a factor's weight is ~0 to the prediction, is there a way to remove it when new_design()
is run.
Thank you for your help. Much appreciated.
from doepipeline.
Related Issues (15)
- Local serial execution of test pipeline HOT 3
- rounded ints passed as arguments, floats used for internal calculations HOT 4
- possible number of "qos=short" jobs passed to SLURM is capped HOT 3
- Factor value definition HOT 4
- Slurm error: PipelineRunFailed HOT 3
- init values outside min/max range HOT 1
- Restarting failed experiments during iteration, and to resume at a failed iteration HOT 2
- Prepending scripts with env-variable setting and setup-scripts HOT 1
- Remote host closing connection causes crash HOT 1
- Break out or change the scheduler part HOT 1
- Ideas for scaffolding example HOT 1
- pyDOE issue HOT 1
- pyDOE2 issues HOT 1
- Print of version number HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from doepipeline.