Amazon SageMaker Solution for explaining credit decisions.

License: Apache License 2.0

Python 54.82% Dockerfile 0.49% Jupyter Notebook 39.44% HTML 3.80% CSS 0.38% Shell 1.07%

explainable-ai explainable-ml explainability credit-scoring loan-prediction-analysis financial-analysis machinelearning shapley sagemaker aws-sagemaker

sagemaker-explaining-credit-decisions's Introduction

Explaining Credit Decisions with Amazon SageMaker

Given the increasing complexity of machine learning models, the need for model explainability has been growing lately. Some governments have also introduced stricter regulations that mandate a right to explanation from machine learning models. In this solution, we take a look at how Amazon SageMaker can be used to explain individual predictions from machine learning models.

As an example application, we classify credit applications and predict whether the credit would be paid back or not (often called a credit default). More context can be found here. We train a tree-based LightGBM model using Amazon SageMaker and explain its predictions using a game theoretic approach called SHAP (SHapley Additive exPlanations).

Ultimately, we deploy an endpoint that returns the model prediction and the associated explanation.

What is an explanation?

Given a set of input features used to describe a credit application (e.g. credit__amount and employment__duration), an explanation reflects the contribution of each feature to the model's final prediction. We include a number of visualizations that can be used to see how each feature pushes up or down the risk of credit default for an individual application. Click on the screenshot below to see an example of an exported explanation report.

Getting Started

You will need an AWS account to use this solution. Sign up for an account here.

To run this JumpStart 1P Solution and have the infrastructure deploy to your AWS account you will need to create an active SageMaker Studio instance (see Onboard to Amazon SageMaker Studio). When your Studio instance is Ready, use the instructions in SageMaker JumpStart to 1-Click Launch the solution.

The solution artifacts are included in this GitHub repository for reference.

Note: Solutions are available in most regions including us-west-2, and us-east-1.

Caution: Cloning this GitHub repository and running the code manually could lead to unexpected issues! Use the AWS CloudFormation template. You'll get an Amazon SageMaker Notebook instance that's been correctly setup and configured to access the other resources in the solution.

cloudformation/
- explaining-credit-decisions.yaml: Creates AWS CloudFormation Stack for solution.
- glue.yaml: Used to create AWS Glue components.
- sagemaker.yaml: Used to create Amazon SageMaker components.
- solution-assistant.yaml: Used to prepare demonstration datasets and clean up resources.
dataset/
- german.data: Original German Credit Dataset used to create synthetic datasets.
glue/
- etl_job.py: Use by AWS Glue Job to transform datasets.
lambda/
- datasets.py: Used to generate synthetic datasets.
- lambda_function.py: Solution Assistant create and delete logic.
- requirements.txt: Describes Python package requirements of the AWS Lambda function.
sagemaker/
- requirements.txt: Describes Python package requirements of the Amazon SageMaker Notebook instance.
- setup.py: Describes Python package used in the solution.
- containers/
  - dashboard/
  - model/
    - Dockerfile: Describes custom Docker image hosted on Amazon ECR.
    - requirements.txt: Describes Python package requirements of the Docker image.
    - entry_point.py: Used by Amazon SageMaker for training and endpoint hosting.
- notebooks/
  - notebook.ipynb: Orchestrates the solution.
- package/
  - config.py: Stores and retrieves project configuration.
  - utils.py: Various utility functions for scripts and/or notebooks.
  - visuals.py: Contains explanation visualizations.
  - data/
    - datasets.py: Contains functions for reading datasets.
    - glue.py: Manages the AWS Glue workflow of crawling datasets and running jobs.
    - schemas.py: Schema creation and data validation.
  - machine_learning/
    - preprocessing.py: Scikit-learn steps to pre-process data for model.
    - training.py: Scikit-learn steps to train and test model.
  - sagemaker/
    - containers.py: Manages the Docker workflow of building and pushing images to Amazon ECR.
    - estimator_fns.py: Contains functions used by estimator.
    - explainer_fns.py: Contains functions used by explainer.
    - predictor_fns.py: Contains functions used by predictor.
    - predictors.py: Custom predictor for using JSON endpoint from notebook.

Architecture

As part of the solution, the following services are used:

AWS Lambda: Used to generate a synthetic credits dataset and upload to Amazon S3.
AWS Glue: Used to crawl datasets, and transform the credits dataset using Apache Spark.
Amazon S3: Used to store datasets and the outputs of the AWS Glue Job.
Amazon SageMaker Notebook: Used to train the LightGBM model.
Amazon ECR: Used to store the custom Scikit-learn + LightGBM training environment.
Amazon SageMaker Endpoint: Used to deploy the trained model and SHAP explainer.
Amazon SageMaker Batch Transform: Used to compute explanations in batch.

Costs

You are responsible for the cost of the AWS services used while running this solution.

As of 6th April 2020 in the US West (Oregon) region, the cost to:

prepare the dataset with AWS Glue is ~$0.75.
train the model using Amazon SageMaker training job on ml.c5.xlarge is ~$0.02.
host the model using Amazon SageMaker Endpoint on ml.c5.xlarge is $0.119 per hour.
run an Amazon SageMaker notebook instance is $0.0582 per hour.

All prices are subject to change. See the pricing webpage for each AWS service you will be using in this solution.

Cleaning Up

When you've finished with this solution, make sure that you delete all unwanted AWS resources. AWS CloudFormation can be used to automatically delete all standard resources that have been created by the solution and notebook. Go to the AWS CloudFormation Console, and delete the parent stack. Choosing to delete the parent stack will automatically delete the nested stacks.

Caution: You need to manually delete any extra resources that you may have created in this notebook. Some examples include, extra Amazon S3 buckets (to the solution's default bucket), extra Amazon SageMaker endpoints (using a custom name), and extra Amazon ECR repositories.

Customizing

Our solution is easily customizable. You can customize the:

AWS Glue Crawler to crawl your own datasets.
- You can create AWS Glue Connections if your data is in a JDBC data store.
AWS Glue Job to process your own datasets.
- See glue/etl_job.py.
- AWS Glue development endpoints can help when customizing the job.
Schemas to specify descriptions for your own features.
- See sagemaker/notebook.ipynb.
Machine learning pipeline
- See sagemaker/package/machine_learning/preprocessing.py.
- See sagemaker/package/machine_learning/training.py.

FAQ

What is explainability?

Model explainability is the degree to which humans can understand the cause of decisions made by a machine learning model. Many methods now exist for formulating explanations from complex models that are interpretable and faithful.

Why is explainability useful?

An explanation gives stakeholders a way to understand the relationships and patterns learned by a machine learning model. As an example, an explanation can be used to verify that meaningful relationships are being used by the model instead of spurious relationships. Such checks can give stakeholders more confidence in the reliability and robustness of the model for real-world deployments. It’s critical for building trust in the system. When issues are found, explanations often give scientists a strong indication of what needs to be fixed in the dataset or model training procedure: saving significant time and money. Other serious issues, such a social discrimination and bias, can be clearly flagged by an explanation.

Why is credit default prediction useful? And how does explainability help?

Given a credit application from a bank customer, the aim of the bank is to predict whether or not the customer will pay back the credit in accordance with their repayment plan. When a customer can't pay back their credit, often called a 'default', the bank loses money and the customer's credit score will be impacted. On the other hand, denying trustworthy customers credit also has a set of negative impacts.

Using accurate machine learning models to classify the risk of a credit application can help find a good balance between these two scenarios, but this provides no comfort to those customers who have been denied credit. Using explainability methods, it's possible to determine actionable factors that had a negative impact on the application. Customers can then take action to increase their chance of obtaining credit in subsequent applications.

What is SHAP?

SHAP (Lundberg et al. 2017) stands for SHapley Additive exPlanations. 'Shapley' relates to a game theoretic concept called Shapley values that is used to create the explanations. A Shapley value describes the marginal contribution of each 'player' when considering all possible 'coalitions'. Using this in a machine learning context, a Shapley value describes the marginal contribution of each feature when considering all possible sets of features. 'Additive' relates to the fact that these Shapley values can be summed together to give the final model prediction.

As an example, we might start off with a baseline credit default risk of 10%. Given a set of features, we can calculate the Shapley value for each feature. Summing together all the Shapley values, we might obtain a cumulative value of +30%. Given the same set of features, we therefore expect our model to return a credit default risk of 40% (i.e. 10% + 30%).

Useful Resources

Credits

Our datasets (i.e. credits, people and contacts) were synthetically created from features contained in the German Credit Dataset (UCI Machine Learning Repository). All personal information was generated using Faker.

License

This project is licensed under the Apache-2.0 License.

sagemaker-explaining-credit-decisions's People

Contributors

Stargazers

Watchers

sagemaker-explaining-credit-decisions's Issues

The Download was cancelled for this solution.

Get The Download was cancelled for this solution. when attempting to Launch this JumpStart. I have verified at least one other JumpStart builds without issue in my environment.

To reproduce:

Configure SageMaker domain and user profile with full permissions
Open SageMaker Studio and browse for JumpStart Explain Credit Decisions
Click Launch

Parameter SolutionPrefix failed to satisfy constraint:

Hi there, its very good repo to exercise but not working. I tried to create stack but not allowing me to create. The following is the error : Parameter SolutionPrefix failed to satisfy constraint: Only allowed to use lowercase letters, hyphens and/or numbers. Should start with 'sm-soln-explaining-' for permission management. Should be 39 characters or less. Will you please help me? I am new to IAC and AWS. I look forward to hearing from you. Thanks

Cell fails in 5_dashboard

Calling get_notebook_name() fails with KeyError: 'SolutionPrefix'

port = 8501
url = get_dashboard_url(port)
!echo Dashboard URL: {url}
!(cd ../containers/dashboard/src && streamlit run app.py --server.port {port})

KeyError Traceback (most recent call last)
in
1 port = 8501
----> 2 url = get_dashboard_url(port)
3 get_ipython().system('echo Dashboard URL: {url}')
4 get_ipython().system('(cd ../containers/dashboard/src && streamlit run app.py --server.port {port})')

in get_dashboard_url(port)
8
9 def get_dashboard_url(port):
---> 10 notebook_name = get_notebook_name()
11 region_name = sagemaker.Session().boto_region_name
12 return f"https://{notebook_name}.notebook.{region_name}.sagemaker.aws/proxy/{port}/"

in get_notebook_name()
2 with open('/opt/ml/metadata/resource-metadata.json') as openfile:
3 data = json.load(openfile)
----> 4 notebook_name = data['SolutionPrefix']
5 return notebook_name
6 #return 'sm-soln-explaining-credit-decisions-notebook'

KeyError: 'SolutionPrefix'

User isn't allowed to create CloudFormation Stack

Cannot create Database

Here is the error
[fae5d684-f4c7-3e09-a5a4-7a95657088bf] ERROR : Insufficient Lake Formation permission(s) on sagemaker-soln-ecd-js-98cdwa-database (Database name: sagemaker-soln-ecd-js-98cdwa-database) (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: 2c810ee2-dd2c-43ce-89ad-f778cdfaf48c;

AWS CloudFormation Stack: CREATE_FAILED S3 error: Access Denied

When creating the stack, the stack creation fails with the following issue inside GlueStack.

S3 error: Access Denied For more information check http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

Output 'SageMakerRoleArn' not found in stack

GlueStack creation fails in Cloudformation

AWS Glue Workflow failed. Unable to verify existence of default database.

When running:

glue.wait_for_workflow_finished(config.GLUE_WORKFLOW, glue_run_id)

Get the following error:

Exception: AWS Glue Workflow failed. Check the workflow logs on the console.

After checking the AWS Glue Workflow, the error is on the Job stage.

Clicking on the Job shows the following error:

AnalysisException: 'java.lang.RuntimeException: MetaException(message:Unable to verify existence of default database: com.amazonaws.services.glue.model.AccessDeniedException: Please migrate your Catalog to enable access to this database (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: b5f4516c-0451-4a55-a015-b64f7761e21f));'

Can't connect to Streamlit server

This could be related to #5.

When "fixing" get_notebook_name() like so:

def get_notebook_name():
    #with open('/opt/ml/metadata/resource-metadata.json') as openfile:
    #    data = json.load(openfile)
    #notebook_name = data['SolutionPrefix']
    #return notebook_name
    return 'sm-soln-explaining-credit-decisions-notebook'

The notebook generates a proper URL for the streamlit server:

https://sm-soln-explaining-credit-decisions-notebook.notebook.us-east-2.sagemaker.aws/proxy/8501/

I can successfully open that URL, but I get stuck on a "Please wait message". No dashboard is actually displayed. I tried several browsers, deactivated browser plugins, etc.

I can see Streamlit running on the notebook instance:

top - 10:39:31 up  1:04,  0 users,  load average: 0.16, 0.11, 0.14
Tasks: 107 total,   1 running,  77 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.0%us,  0.5%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.5%st
Mem:   3980096k total,  3788248k used,   191848k free,  1384196k buffers
Swap:        0k total,        0k used,        0k free,   654160k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20161 ec2-user  20   0 1347m 242m  71m S  4.7  6.2   0:20.70 streamlit
 9317 ec2-user  20   0  915m 121m  33m S  2.0  3.1   0:19.51 python
 5287 ec2-user  20   0 1320m 114m  24m S  0.7  2.9   0:21.50 jupyter-noteboo
 4679 root      20   0  742m  43m  29m S  0.3  1.1   0:08.29 amazon-cloudwat
 4696 root      20   0  742m  42m  29m S  0.3  1.1   0:07.74 amazon-cloudwat

AWS Glue Workflow failed

step failed after running

glue.wait_for_workflow_finished(config.GLUE_WORKFLOW, glue_run_id)

It says

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-8-350db00730e8> in <module>()
----> 1 glue.wait_for_workflow_finished(config.GLUE_WORKFLOW, glue_run_id)

~/SageMaker/src/package/data/glue.py in wait_for_workflow_finished(workflow_name, workflow_run_id, polling_frequency)
     39     workflow_name, workflow_run_id, polling_frequency=10
     40 ):
---> 41     while not workflow_finished(workflow_name, workflow_run_id):
     42         sleep(polling_frequency)
     43         print('.', end='', flush=True)

~/SageMaker/src/package/data/glue.py in workflow_finished(workflow_name, workflow_run_id)
     29         else:
     30             raise Exception(
---> 31                 "AWS Glue Workflow failed. "
     32                 "Check the workflow logs on the console."
     33             )

Exception: AWS Glue Workflow failed. Check the workflow logs on the console.

In the Glue console, it says

AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found);'

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

awslabs / sagemaker-explaining-credit-decisions Goto Github PK