The article-information-2019's discuss from h2oai

Nick coding TODOs

Discrimination testing for simulated data
Discrimination testing for mortgage data

Navdeep MGBM TODOs

Features

Input feature list
Global feature importance rankings for @kmontgom2400

For mortgage data

Modelling:
- Unconstrained GBM trained w/ random grid search and 5-fold CV with training/CV and test AUC, Accuracy, RMSE
- constrained MGBM trained w/ random grid search and 5-fold CV with training/CV and test AUC, Accuracy, RMSE
MLI:
- Mean local feature importance values across quantiles of predictions (by Shapley) for MGBM for top 3 features
- Partial dependence curves for MGBM for top 3 features
- ICE curves at quantiles of predictions for MGBM for top 3 features

For simulated data

Modelling:
- Unconstrained GBM trained w/ random grid search and 5-fold CV with training/CV and test AUC, Accuracy, RMSE
- constrained MGBM trained w/ random grid search and 5-fold CV with training/CV and test AUC, Accuracy, RMSE
MLI:
- Mean local feature importance values across quantiles of predictions (by Shapley) for MGBM for top 3 features
- Partial dependence curves for MGBM for top 3 features
- ICE curves at quantiles of predictions for MGBM for top 3 features

Fairness

Pandas frame of predictions and row IDs for the test data for @nickpschmidt to conduct discrimination testing for MGBM
- Models are saved under /models. Just need to use the .predict() function on data to get predictions.

Kim XNN TODOs

Features

Input feature list (hopefully informed by @navdeep-G using GBM Shapley)

For simulated data

Unconstrained feedforward ANN trained w/ 5-fold CV with training/CV and test AUC, Accuracy, RMSE, logloss
XNN trained w/ 5-fold CV with training/CV and test AUC, Accuracy, RMSE, logloss
Mean local feature importance values across quintiles of predictions (by Shapley or gradient-based) for XNN for top 5 features
Ridge function curves for XNN for top 5 features
ICE curves at quintiles of predictions for XNN for top 5 features

For mortgage data

Unconstrained feedforward ANN trained w/ 5-fold CV with training/CV and test AUC, Accuracy, RMSE
XNN trained w/ 5-fold CV with training/CV and test AUC, Accuracy, RMSE
Mean local feature importance values across quintiles of predictions (by Shapley or gradient-based) for XNN for top 5 features
Ridge function curves for XNN for top 5 features
ICE curves at quintiles of predictions for XNN for top 5 features

Fairness

Pandas frame of predictions and row IDs for the test data for @nickpschmidt to conduct discrimination testing for XNN

Add README to data directory

A simple README in the data directory pointing out information about certain files and scripts will make it easier to navigate this subdirectory.

Add data simulation code to repo

Code is here

Raw HDMA data missing & provided raw data file is not used in scripts

The file noted as the raw/input HDMA data in the README is: hmda_lar_2018_orig_mtg_sample.csv

However, this file is not used anywhere in the scripts and I am not sure where it comes from, but the fields match at least a subset of what's listed in the data dictionary. I am using this file because it's the only copy of the source data I can find and I want to build some Disparate Impact Analysis demos and it's a nice dataset to use.

The source data used in hmda_sample_for_paper.py is a hardcoded path that's not available. So I think there is some disconnect here. It also contains many more fields than are present in the hmda_lar_2018_orig_mtg_sample.csv file -- more refined information, so instead of just derived_race (a summary of race of applicant & co-applicant), it will have 5 race fields for each applicant (primary and co-).

Add README to notebooks directory

A simple README in the notebooks directory pointing out information about certain files and scripts will make it easier to navigate this subdirectory.

Patrick writing TODOs

Methods and Materials section

Introduce unconstrained and constrained models
Send for editing
Introduce explanatory methods
Send for editing
Minimize self-plagiarism in explanatory methods section
Send for editing
Add useful Python packages into software section
Send for editing

Results section

Simulated data results
Send for editing
Mortgage data results
Send for editing

Discussion section

The Burgeoning Ecosystem of Interpretable Models section
Send for editing
Intersectionality of Interpretability, Fairness, and Security section
Send for editing

Conclusion

Conclusion
Send for editing

General

General editing and double checking

Review Zest article

Article is here
Lets add our findings to this issue

If none are found we can opt for the creditcard dataset and COMPAS

h2oai / article-information-2019 Goto Github PK

article-information-2019's Issues

Nick coding TODOs

Navdeep MGBM TODOs

Nick writing TODOs

Kim XNN TODOs

Add README to data directory

Add data simulation code to repo

Raw HDMA data missing & provided raw data file is not used in scripts

Add README to notebooks directory

Patrick writing TODOs

Review Zest article

Post first-pass version of program that creates Friedman 1 Simulated data

Add MCC, sensitivity, specificity, and F1 for MGBM and GBM

Explore economic literature for promising datasets

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent