Papers on Explainable Artificial Intelligence

This is an on-going attempt to consolidate interesting efforts in the area of understanding / interpreting / explaining / visualizing a pre-trained ML model.

GUI tools

DeepVis: Deep Visualization Toolbox. Yosinski et al. 2015 code | pdf
SWAP: Generate adversarial poses of objects in a 3D space. Alcorn et al. 2018 code | pdf

Libraries

CNN visualizations (activation maximization, PyTorch)
iNNvestigate (heatmaps, Keras)
DeepExplain (heatmaps, Keras)
Lucid (activation maximization, heatmaps, Tensorflow)

Surveys

Methods for Interpreting and Understanding Deep Neural Networks. Montavon et al. 2017 pdf
Visualizations of Deep Neural Networks in Computer Vision: A Survey. Seifert et al. 2017 pdf
How convolutional neural network see the world - A survey of convolutional neural network visualization methods. Qin et al. 2018 pdf
A brief survey of visualization methods for deep learning models from the perspective of Explainable AI. Chalkiadakis 2018 pdf
A Survey Of Methods For Explaining Black Box Models. Guidotti et al. 2018 pdf
Understanding Neural Networks via Feature Visualization: A survey. Nguyen et al. 2019 pdf

Definitions of Interpretability

The Mythos of Model Interpretability. Lipton 2016 pdf
Towards A Rigorous Science of Interpretable Machine Learning. Doshi-Velez & Kim. 2017 pdf
Interpretable machine learning: definitions, methods, and applications. Murdoch et al. 2019 pdf

Books

A Guide for Making Black Box Models Explainable. Molnar 2019 pdf

A. Explaining inner-workings

A1. Visualizing Preferred Stimuli

Synthesizing images / Activation Maximization

AM: Visualizing higher-layer features of a deep network. Erhan et al. 2009 pdf
DeepVis: Understanding Neural Networks through Deep Visualization. Yosinski et al. 2015 pdf | url
MFV: Multifaceted Feature Visualization: Uncovering the different types of features learned by each neuron in deep neural networks. Nguyen et al. 2016 pdf | code
DGN-AM: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Nguyen et al. 2016 pdf | code
PPGN: Plug and Play Generative Networks. Nguyen et al. 2017 pdf | code
Feature Visualization. Olah et al. 2017 url
Diverse feature visualizations reveal invariances in early layers of deep neural networks. Cadena et al. 2018 pdf

Real images / Segmentation Masks

Visualizing and Understanding Recurrent Networks. Kaparthey et al. 2015 pdf
Object Detectors Emerge in Deep Scene CNNs. Zhou et al. 2015 pdf
Understanding Deep Architectures by Interpretable Visual Summaries pdf

A2. Inverting Neural Networks

Understanding Deep Image Representations by Inverting Them pdf
Inverting Visual Representations with Convolutional Networks pdf
Neural network inversion beyond gradient descent pdf

A3. Distilling DNNs into more interpretable models

Interpreting CNNs via Decision Trees pdf
Distilling a Neural Network Into a Soft Decision Tree pdf
Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation. Tan et al. 2018 pdf
Improving the Interpretability of Deep Neural Networks with Knowledge Distillation. Liu et al. 2018 pdf

A4. Quantitatively characterizing hidden features

TCAV: Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors. Kim et al. 2018 pdf | code
- Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks. Ghorbani et al. 2019 pdf
SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. Raghu et al. 2017 pdf | code
A Peek Into the Hidden Layers of a Convolutional Neural Network Through a Factorization Lens. Saini et al. 2018 pdf
Network Dissection: Quantifying Interpretability of Deep Visual Representations. Bau et al. 2017 url | pdf
- GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. Bau et al. 2018 pdf
- Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks. Fong & Vedaldi 2018 pdf

A5. Network surgery

How Important Is a Neuron? Dhamdhere et al. 2018 pdf

A6. Sensitivity analysis

NLIZE: A Perturbation-Driven Visual Interrogation Tool for Analyzing and Interpreting Natural Language Inference Models. Liu et al. 2018 pdf

B. Decision explanations

B1. Heatmaps

White-box / Gradient-based

A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations. Nie et al. 2018 pdf
A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks pdf
CAM: Learning Deep Features for Discriminative Localization. Zhou et al. 2016 code | web
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Selvaraju et al. 2017 pdf
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks. Chattopadhyay et al. 2017 pdf | code
LRP: Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation pdf
- DTD: Explaining NonLinear Classification Decisions With Deep Tayor Decomposition pdf
Regional Multi-scale Approach for Visually Pleasing Explanations of Deep Neural Networks. Seo et al. 2018 pdf
Interpretable Explanations of Black Boxes by Meaningful Perturbation. Fong et al. 2017 pdf
Integrated Gradients: Axiomatic Attribution for Deep Networks. Sundararajan et al. 2018 pdf | code
I-GOR: Visualizing Deep Networks by Optimizing with Integrated Gradients. Qi et al. 2019 pdf
Visual explanation by interpretation: Improving visual feedback capabilities of deep neural networks. Oramas et al. 2019 pdf

Black-box / Perturbation-based

RISE: Randomized Input Sampling for Explanation of Black-box Models. Petsiuk et al. 2018 pdf
LIME: Why should i trust you?: Explaining the predictions of any classifier. Ribeiro et al. 2016 pdf | blog

Evaluating heatmaps

The (Un)reliability of saliency methods. Kindermans et al. 2018 pdf
Sanity Checks for Saliency Maps. Adebayo et al. 2018 pdf

B2. Learning to explain

Learning how to explain neural networks: PatternNet and PatternAttribution pdf
Deep Learning for Case-Based Reasoning through Prototypes pdf
Unsupervised Learning of Neural Networks to Explain Neural Networks pdf
Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions pdf
- Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations pdf
Towards robust interpretability with self-explaining neural networks. Alvarez-Melis and Jaakola 2018 pdf

C. Counterfactual explanations (what would have happen)

Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections. Zhang et al. 2018 pdf

D. Unclassified

Yang, S. C. H., & Shafto, P. Explainable Artificial Intelligence via Bayesian Teaching. NIPS 2017 pdf
Explainable AI for Designers: A Human-Centered Perspective on Mixed-Initiative Co-Creation pdf
ICADx: Interpretable computer aided diagnosis of breast masses. Kim et al. 2018 pdf
Neural Network Interpretation via Fine Grained Textual Summarization. Guo et al. 2018 pdf
LS-Tree: Model Interpretation When the Data Are Linguistic. Chen et al. 2019 pdf

fatemehjamshidi1993 / xai-papers Goto Github PK

xai-papers's Introduction

Papers on Explainable Artificial Intelligence

GUI tools

Libraries

Surveys

Definitions of Interpretability

Books

A. Explaining inner-workings

A1. Visualizing Preferred Stimuli

Synthesizing images / Activation Maximization

Real images / Segmentation Masks

A2. Inverting Neural Networks

A3. Distilling DNNs into more interpretable models

A4. Quantitatively characterizing hidden features

A5. Network surgery

A6. Sensitivity analysis

B. Decision explanations

B1. Heatmaps

White-box / Gradient-based

Black-box / Perturbation-based

Evaluating heatmaps

B2. Learning to explain

C. Counterfactual explanations (what would have happen)

D. Unclassified

xai-papers's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org