Giter Club home page Giter Club logo

dcem's Introduction

Disparate Censorship Expectation-Maximization (DCEM)

This is the official code release for "From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions" (ICML '24).

Sometimes, when training a machine learning model, labels are missing/noisy. Sometimes, they're even missing/noisy in biased ways. We proposed Disparate Censorship Expectation-Maximization (ICML '24) to help mitigate impacts on model performance and mitigate bias.

This algorithm was inspired by a real-world problem in machine learning for healthcare: often, we assume that individuals that didn't get a diagnostic test are negative. This has been documented in papers analyzing COVID-19 testing, sepsis diagnostic definitions, and more.

If you're interested in reproducing our results from our paper, please check out the README in the legacy folder, which contains all of our experimental code. If you're interested in applying DCEM to your own problems/learning how to implement DCEM yourself, we recommend consulting this repo.

Quickstart

Run pip install -r requirements.txt to get the required dependencies.

Here's how you can apply DCEM to your own data:

    from dcem import DCEM

    # setup (what you should initialize)
    train_data, test_data = ... # load this however you like
    X_tr, A_tr, T_tr, Y_obs_tr = train_data
    propensity_model = ... # must be a nn.Module or implement `.fit()`
    outcome_model = ... # must be a nn.Module
    model = DCEM(propensity_model, outcome_model)

    # training
    model.fit(X_tr, A_tr, T_tr, Y_obs_tr, Y=Y_tr) # optionally pass in y in synthetic data

    # inference
    X_tr, *_ = test_data
    preds = model.predict_proba(X_tr)[:, 1]

We also provide a full demo of DCEM with example synthetic data in demo.py.

How DCEM works (informally)

DCEM is designed for situations where labeling decisions are noisy and potentially biased (in a fairness/equity sense). In such situations, if we fit a model to simply predict the observed outcome, we'll probably also learn to replicate these labeling biases. That's often undesirable.

Enter DCEM: our method leverages variables that we assume do not affect the outcome of interest (such as "protected attributes") to learn a model that "compensates" for labeling biases. For a comprehensive and formal treatment of DCEM, please see our paper.

Contributing/reporting issues

Contributions. We absolutely welcome contributions. This is a fairly bare-bones implementation of DCEM, but we hope to grow the functionality. Please raise an issue to discuss potential extensions or features you'd like to see before submitting a pull request.

Issues/bugs. All models are wrong; some are useful. Sadly, the same is not true of code. Please open an issue to discuss any potential bugs!

Special thanks to Gregory Kondas for help with testing the code!

Contact

Please reach out to ctrenton at umich dot edu or file a Github issue if you have any questions about our work. Thank you!

dcem's People

Contributors

tchang1997 avatar

Watchers

 avatar Shengpu Tang avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.