Giter Club home page Giter Club logo

doepy's Introduction

Welcome ๐Ÿ‘‹

Hello! This is Tirtha. I am an explorer.

Work

I am working as VP, AI/ML, at Rhombus Power Inc., where I am building exciting and critically important solutions with AI, Data, and Math.

Before this, I was a Data Science and Solutions Engineering Manager at Adapdix Corp, putting the power of AI/ML on the Edge for Industry 4.0 and next-generation Smart Factory.

Even before that, I was a Sr. Principal Engineer developing power semiconductor technologies and applying AI/ML for semiconductor product/tech deveklopment at ON Semiconductor, also known as onsemi.

At its core, I translate customer business problems into data-driven problems and help build solutions.

Currently...

  • ๐Ÿ”ญ Iโ€™m currently working on: lectures/workshops, courses, and spreading knowledge on machine learning/statistical modeling. In particular, I serving as the Track Chair of "AI Optimization" track for the ValleyML AI Expo 2021. Also, I am developing course content for the ValleyML Fellowship program.

  • ๐ŸŒฑ Iโ€™m currently learning: ML flow management tools, Ray serve and distributed computing, and how AI/ML applies to the various aspects of the Industrial IoT sector.

  • ๐Ÿ‘ฏ Iโ€™m looking to collaborate on: Data science/ML books. Probably will use Jupyter Books and Leanpub platform

Books, lectures, articles

I publish highly-cited articles regularly on data science and machine learning topics, on leading platforms such Towards Data Science, KDNuggets, and Analytics Vidya.

I also teach IEEE/ACM workshops on data science/ machine learning.

My first data science related book Data wrangling with Python was published on February, 2019. In future, I wish to self-publish a second book about Hands-on mathematics/statistics for data scientists.

Skills

Open-source

Anurag's github stats

My open-source projects span the topics of,

  • general data analytics,
  • machine learning,
  • deep learning,
  • computer vision and image processing,
  • math and statistics,
  • synthetic data generation, etc.

I have published multiple Python packages related to data analytics and statistical modeling. See this page for my projects

Top Langs

Contribution to the technical community

Currently, in the organizing committee of ValleyML AI Expo 2021.

I served on the Technical Content Committee for the Open Data Science Conference (ODSC) West, 2020.

In 2015, I was elevated to the grade of Senior Member of IEEE for my contributions towards power electronics. I have authored/co-authored more than 25 peer-reviewed Transaction and Conference papers, 2 monographs/book chapters, and 4 U.S. Patents. Here is my Google Scholar Page.

I also serve on the technical program committee as Track/Topic chair in numerous IEEE conferences. I am the co-chair of the Semiconductor Committee of Power Supply Manufacturers' Association (PSMA).

doepy's People

Contributors

dwflanagan avatar tirthajyoti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

doepy's Issues

Parsing factor levels for two leveled designs

Thank you for this great work.
I think that the code snippet that is used to trim factor_level_ranges in each two leveled design should search for the min, max value of the array and avoid duplicate values.
something like:
for key in factor_level_ranges:
if len(factor_level_ranges[key]) != 2:
factor_level_ranges[key][0] = min(factor_level_ranges[key])
factor_level_ranges[key][1] = max(factor_level_ranges[key])
factor_level_ranges[key] = factor_level_ranges[key][:2]
print(
f"{key} had more than two levels. Assigning the end point to the high level."
)
if (factor_level_ranges[key][0]==factor_level_ranges[key][1]):
rep_value=factor_level_ranges[key][0]
raise ValueError("duplicate value '{rep_value}' found in key '{key}'")
Furthermore, this code should be included in a function or a decorator since you are reusing it a lot in your code.

replicates?

Hello,
How to handle replication - is the doe designer capable of designing replicated experiments?

Fractional Factorial design changes level values

Using the code below I would expect a design with 8 experiments whereby the min and max levels are used for each attribute. That works and I get a design that makes sense except for one item: the levels for G2 are changed from 0.2 and 0.4 into 0 and 1. This behavior does not change when I add a middle level, if I change the order of the attributes in the design space, or if I change the name of attribute G2. It does work however, when I change the values to 2 and 4. It seems that when one of the levels is below a value of 1, that the levels are changed to 0 and 1.

My code:

from doepy import build

Define the design space

design_space = {'P_CG_substance':['P','CG'],
'P_CG_level':[1,2,3],
'AF':[1, 1.5, 2],
'MX':[1.25, 1.5, 2],
'G2':[0.2, 0.4],
}

print(design_space)

Build the design

design = build.frac_fact_res(design_space)

In the design for column P_CG_substance, replace 0 with P and 1 with CG

design['P_CG_substance'] = design['P_CG_substance'].replace({0:'P', 1:'CG'})

Print the design

print(design)

Print the number of experiments

print(f'number of experiments is {len(design)}')

Expected result:
P_CG_substance P_CG_level AF MX G2
0 P 1.0 1.0 2.00 0.4
1 CG 1.0 1.0 1.25 0.2
2 P 3.0 1.0 1.25 0.4
3 CG 3.0 1.0 2.00 0.2
4 P 1.0 2.0 2.00 0.2
5 CG 1.0 2.0 1.25 0.4
6 P 3.0 2.0 1.25 0.2
7 CG 3.0 2.0 2.00 0.4

What I get:
P_CG_substance P_CG_level AF MX G2
0 P 1.0 1.0 2.00 1.0
1 CG 1.0 1.0 1.25 0.0
2 P 3.0 1.0 1.25 1.0
3 CG 3.0 1.0 2.00 0.0
4 P 1.0 2.0 2.00 0.0
5 CG 1.0 2.0 1.25 1.0
6 P 3.0 2.0 1.25 0.0
7 CG 3.0 2.0 2.00 1.0

prob_distribution

Hello, argument prob_distribution is not used for LHS model, in build_lhs function?

Analyzing doe?

Hello,
Once doe is generated and results acquired, how to analyze results?
Do you have plans to integrate corresponding statistical analysis tools (such as ANOVA, regression etc.) into the doepy or can you advise which respective python libraries to use for results analysis?

DOE Full Factorial

The Full Factorial DOE algorithm is repeating some experiments.
Have you faced this issue?

Support for parameters with string values

Is there a way to prevent string values from beeing converted to floats? E.g. if I define string_parameter=['a', 'b', 'c']

from doepy import build

build.full_fact({
    'int_param':[1, 2, 3],
    'float_param': [0.1, 0.2, 0.3],
    'string_param': ['a', 'b', 'c'],
)

the values for string_param are implicitly converted to 0.0, 1.0, 2.0 respectively. Instead I'd like to get the plain strings 'a', 'b', 'c'.

Constraints

Hi there, is it possible to introduce constraints such as A + B + C = 1?
Thank you!

Full Factorial not accepting/parsing floats value correctly

As shown in flowrate column, the Flow rate':[0.2,0.4] was label encoded into 0.0 and 1.0. Was different to what is shown in the documentation.

build.full_fact({'Pressure':[40,55,70],'Temperature':[290, 320, 350],'Flow rate':[0.2,0.4],'Time':[5,8]})
    Pressure  Temperature  Flow rate  Time
0       40.0        290.0        0.0   5.0
1       55.0        290.0        0.0   5.0
2       70.0        290.0        0.0   5.0
3       40.0        320.0        0.0   5.0
4       55.0        320.0        0.0   5.0
5       70.0        320.0        0.0   5.0
6       40.0        350.0        0.0   5.0
7       55.0        350.0        0.0   5.0
8       70.0        350.0        0.0   5.0
9       40.0        290.0        1.0   5.0
10      55.0        290.0        1.0   5.0
22      55.0        320.0        0.0   8.0
23      70.0        320.0        0.0   8.0
24      40.0        350.0        0.0   8.0
25      55.0        350.0        0.0   8.0
26      70.0        350.0        0.0   8.0
27      40.0        290.0        1.0   8.0
28      55.0        290.0        1.0   8.0
29      70.0        290.0        1.0   8.0
30      40.0        320.0        1.0   8.0
31      55.0        320.0        1.0   8.0
32      70.0        320.0        1.0   8.0
33      40.0        350.0        1.0   8.0
34      55.0        350.0        1.0   8.0
35      70.0        350.0        1.0   8.0

What was shown in Documentation:
image

Negative floats return incorrect table

When generating a Latin hypercube sampling (simple or space filled), input factors with negative value levels return a table with incorrect values.

For example,

build.lhs(
{'a':[-1,-5],
'b':[-3,-6],
'c':[1,2]})

returns a table like (numbers truncated for display)

   a        b      c
 0.5     -1.9     1.0
-0.76    -0.78    1.95
-0.04    -2.14    1.43

where clearly the options for A and B are out of bounds for the provided levels, whereas c is correct.

Python 3.8.13
doepy version 0.0.1 installed from pip

Float precision

In doe_function.construct_df, the pandas dataframe is forced to be of dtype=float32. This leads to a non-negligeable loss of accuracy. Double float are common now in all applications.

Is there any reason for this dtype specification?

  • If yes, I would propose to make it an option
  • If no, I would propose to remove it and let pandas handle the dtype from the given array.

Thanks.

Critical issue: full_fact fails after recent update of pandas to 1.5.0

This code from your README file worked until pandas was updated to 1.5.0.

from doepy import build
df = build.full_fact(
{
    'Pressure':[40,55,70],
    'Temperature':[290, 320, 350],
    'Flow rate':[0.2,0.4], 
    'Time':[5,8]}
)
print(df)

It gives now the following. Note how flow rate is returning the values 0 and 1 and not 0.2 and 0.4.

    Pressure  Temperature  Flow rate  Time
0       40.0        290.0        0.0   5.0
1       55.0        290.0        0.0   5.0
...
7       55.0        350.0        0.0   5.0
8       70.0        350.0        0.0   5.0
9       40.0        290.0        1.0   5.0
10      55.0        290.0        1.0   5.0
...
16      55.0        350.0        1.0   5.0
17      70.0        350.0        1.0   5.0

I did a bit of digging around and the issue seems to depend on the factor values. Once you switch a factor to float, the value is no longer returned; just 0, 1, 2, and so on:

from doepy import build
df = build.full_fact(
{
    'Pressure':[40.1,55,70],
    'Flow rate':[5, 8.123], 
})
print(df)

   Pressure  Flow rate
0       0.0        5.0
1      55.0        5.0
2      70.0        5.0
3       0.0        1.0
4      55.0        1.0
5      70.0        1.0

Python version used was 3.9

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.