tirthajyoti / doepy Goto Github PK

Design of Experiment Generator. Read the docs at: https://doepy.readthedocs.io/en/latest/

License: MIT License

Python 100.00%

design design-of-experiments statistics engineering science research phsyics python doe random-design

doepy's Issues

Analyzing doe?

Hello,
Once doe is generated and results acquired, how to analyze results?
Do you have plans to integrate corresponding statistical analysis tools (such as ANOVA, regression etc.) into the doepy or can you advise which respective python libraries to use for results analysis?

Constraints

Hi there, is it possible to introduce constraints such as A + B + C = 1?
Thank you!

prob_distribution

Hello, argument prob_distribution is not used for LHS model, in build_lhs function?

Critical issue: full_fact fails after recent update of pandas to 1.5.0

This code from your README file worked until pandas was updated to 1.5.0.

from doepy import build
df = build.full_fact(
{
    'Pressure':[40,55,70],
    'Temperature':[290, 320, 350],
    'Flow rate':[0.2,0.4], 
    'Time':[5,8]}
)
print(df)

It gives now the following. Note how flow rate is returning the values 0 and 1 and not 0.2 and 0.4.

    Pressure  Temperature  Flow rate  Time
0       40.0        290.0        0.0   5.0
1       55.0        290.0        0.0   5.0
...
7       55.0        350.0        0.0   5.0
8       70.0        350.0        0.0   5.0
9       40.0        290.0        1.0   5.0
10      55.0        290.0        1.0   5.0
...
16      55.0        350.0        1.0   5.0
17      70.0        350.0        1.0   5.0

I did a bit of digging around and the issue seems to depend on the factor values. Once you switch a factor to float, the value is no longer returned; just 0, 1, 2, and so on:

from doepy import build
df = build.full_fact(
{
    'Pressure':[40.1,55,70],
    'Flow rate':[5, 8.123], 
})
print(df)

   Pressure  Flow rate
0       0.0        5.0
1      55.0        5.0
2      70.0        5.0
3       0.0        1.0
4      55.0        1.0
5      70.0        1.0

Python version used was 3.9

Parsing factor levels for two leveled designs

Thank you for this great work.
I think that the code snippet that is used to trim factor_level_ranges in each two leveled design should search for the min, max value of the array and avoid duplicate values.
something like:
for key in factor_level_ranges:
if len(factor_level_ranges[key]) != 2:
factor_level_ranges[key][0] = min(factor_level_ranges[key])
factor_level_ranges[key][1] = max(factor_level_ranges[key])
factor_level_ranges[key] = factor_level_ranges[key][:2]
print(
f"{key} had more than two levels. Assigning the end point to the high level."
)
if (factor_level_ranges[key][0]==factor_level_ranges[key][1]):
rep_value=factor_level_ranges[key][0]
raise ValueError("duplicate value '{rep_value}' found in key '{key}'")
Furthermore, this code should be included in a function or a decorator since you are reusing it a lot in your code.

replicates?

Hello,
How to handle replication - is the doe designer capable of designing replicated experiments?

Float precision

In doe_function.construct_df, the pandas dataframe is forced to be of dtype=float32. This leads to a non-negligeable loss of accuracy. Double float are common now in all applications.

Is there any reason for this dtype specification?

If yes, I would propose to make it an option
If no, I would propose to remove it and let pandas handle the dtype from the given array.

Thanks.

Full Factorial not accepting/parsing floats value correctly

As shown in flowrate column, the Flow rate':[0.2,0.4] was label encoded into 0.0 and 1.0. Was different to what is shown in the documentation.

build.full_fact({'Pressure':[40,55,70],'Temperature':[290, 320, 350],'Flow rate':[0.2,0.4],'Time':[5,8]})
    Pressure  Temperature  Flow rate  Time
0       40.0        290.0        0.0   5.0
1       55.0        290.0        0.0   5.0
2       70.0        290.0        0.0   5.0
3       40.0        320.0        0.0   5.0
4       55.0        320.0        0.0   5.0
5       70.0        320.0        0.0   5.0
6       40.0        350.0        0.0   5.0
7       55.0        350.0        0.0   5.0
8       70.0        350.0        0.0   5.0
9       40.0        290.0        1.0   5.0
10      55.0        290.0        1.0   5.0
22      55.0        320.0        0.0   8.0
23      70.0        320.0        0.0   8.0
24      40.0        350.0        0.0   8.0
25      55.0        350.0        0.0   8.0
26      70.0        350.0        0.0   8.0
27      40.0        290.0        1.0   8.0
28      55.0        290.0        1.0   8.0
29      70.0        290.0        1.0   8.0
30      40.0        320.0        1.0   8.0
31      55.0        320.0        1.0   8.0
32      70.0        320.0        1.0   8.0
33      40.0        350.0        1.0   8.0
34      55.0        350.0        1.0   8.0
35      70.0        350.0        1.0   8.0

What was shown in Documentation:

DOE Full Factorial

The Full Factorial DOE algorithm is repeating some experiments.
Have you faced this issue?

Negative floats return incorrect table

When generating a Latin hypercube sampling (simple or space filled), input factors with negative value levels return a table with incorrect values.

For example,

build.lhs(
{'a':[-1,-5],
'b':[-3,-6],
'c':[1,2]})

returns a table like (numbers truncated for display)

   a        b      c
 0.5     -1.9     1.0
-0.76    -0.78    1.95
-0.04    -2.14    1.43

where clearly the options for A and B are out of bounds for the provided levels, whereas c is correct.

Python 3.8.13
doepy version 0.0.1 installed from pip

Fix a couple of typos in the readme.md

There's a couple of typos I can fix ("supporitng" etc.).

Fractional Factorial design changes level values

Using the code below I would expect a design with 8 experiments whereby the min and max levels are used for each attribute. That works and I get a design that makes sense except for one item: the levels for G2 are changed from 0.2 and 0.4 into 0 and 1. This behavior does not change when I add a middle level, if I change the order of the attributes in the design space, or if I change the name of attribute G2. It does work however, when I change the values to 2 and 4. It seems that when one of the levels is below a value of 1, that the levels are changed to 0 and 1.

My code:

from doepy import build

Define the design space

design_space = {'P_CG_substance':['P','CG'],
'P_CG_level':[1,2,3],
'AF':[1, 1.5, 2],
'MX':[1.25, 1.5, 2],
'G2':[0.2, 0.4],
}

print(design_space)

Build the design

design = build.frac_fact_res(design_space)

In the design for column P_CG_substance, replace 0 with P and 1 with CG

design['P_CG_substance'] = design['P_CG_substance'].replace({0:'P', 1:'CG'})

Print the design

print(design)

Print the number of experiments

print(f'number of experiments is {len(design)}')

Expected result:
P_CG_substance P_CG_level AF MX G2
0 P 1.0 1.0 2.00 0.4
1 CG 1.0 1.0 1.25 0.2
2 P 3.0 1.0 1.25 0.4
3 CG 3.0 1.0 2.00 0.2
4 P 1.0 2.0 2.00 0.2
5 CG 1.0 2.0 1.25 0.4
6 P 3.0 2.0 1.25 0.2
7 CG 3.0 2.0 2.00 0.4

What I get:
P_CG_substance P_CG_level AF MX G2
0 P 1.0 1.0 2.00 1.0
1 CG 1.0 1.0 1.25 0.0
2 P 3.0 1.0 1.25 1.0
3 CG 3.0 1.0 2.00 0.0
4 P 1.0 2.0 2.00 0.0
5 CG 1.0 2.0 1.25 1.0
6 P 3.0 2.0 1.25 0.0
7 CG 3.0 2.0 2.00 1.0

Support for parameters with string values

Is there a way to prevent string values from beeing converted to floats? E.g. if I define string_parameter=['a', 'b', 'c']

from doepy import build

build.full_fact({
    'int_param':[1, 2, 3],
    'float_param': [0.1, 0.2, 0.3],
    'string_param': ['a', 'b', 'c'],
)

the values for string_param are implicitly converted to 0.0, 1.0, 2.0 respectively. Instead I'd like to get the plain strings 'a', 'b', 'c'.