Giter Club home page Giter Club logo

backdoor's Introduction

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

BEFORE YOU RUN OUR CODE

We appreciate your interest in our work and trying out our code. We've noticed several cases where incorrect configuration leads to poor performance of detection and mitigation. If you also observe low detection performance far away from what we presented in the paper, please feel free to open an issue in this repo or contact any of the authors directly. We are more than happy to help you debug your experiment and find out the correct configuration. Also feel free to take a look at previous issues in this repo. Someone might have ran into the same problem, and there might already be a fix.

ABOUT

This repository contains code implementation of the paper "Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks", at IEEE Security and Privacy 2019. The slides are here.

DEPENDENCIES

Our code is implemented and tested on Keras with TensorFlow backend. Following packages are used by our code.

  • keras==2.2.2
  • numpy==1.14.0
  • tensorflow-gpu==1.10.1
  • h5py==2.6.0

Our code is tested on Python 2.7.12 and Python 3.6.8

HOWTO

Injecting Backdoor

For the GTSRB model, the backdoor injection code is under the injection repo. You will need to download the training data from here.

Reverse Engineering

We include a sample script demonstrating how to perform the reverse engineering technique on an infected model. There are several parameters that need to be modified before running the code, which could be modified here.

  • GPU device: if you are using GPU, specify which GPU you would like to use by setting the DEVICE variable
  • Data/model/result folder: if you are using the code on your own models and datasets, please specify the path to the data/model/result files. They are specified by variables here.
  • Meta info: if you are testing it on your own model, please specify the correct meta information about the task, including input size, preprocessing method, total # of labels, and infected label (optional).
  • Configuration of the optimization: There are several parameters you could configure for the optimization process, including learning rate, batch size, # of samples per iteration, total # of iterations, initial value for weight balance, etc. Most parameters fit all models we tested, and you should be able to use the same configuration for your task as well.

To execute the python script, simply run

python gtsrb_visualize_example.py

We already included a sample of infected model for traffic sign recognition in the repo, along with the testing data used for reverse engineering. The sample code uses this model/dateset by default. The entire process for examining all labels in the traffic sign recognition model takes roughly 10 min. All reverse engineered triggers (mask, delta) will be stored under RESULT_DIR. You can also specify which labels you would like to focus on. You could configure it yourself by changing the following code.

Anomaly Detection

We use an anomaly detection algorithm that is based MAD (Median Absolute Deviation). A very useful explanation of MAD could be found here. Our implementation reads all reversed triggers and detect any outlier with small size. Before you execute the code, please make sure the following configuration is correct.

  • Path to reversed trigger: you can specify the location where you put all reversed triggers here. Filename format in the sample code is consistent with previous code for reverse engineering. Our code only checks if there is any anomaly among reversed triggers under the specified folder. So be sure to include all triggers you would like to analyze in the folder.
  • Meta info: configure the correct meta information about the task and model correctly, so our analysis code could load reversed triggers with the correct shape. You need to specify the input shape and the total # of labels in the model.

To execute the sample code, simple run

python mad_outlier_detection.py

Below is a snippet of the output of outlier detection, in the infected GTSRB model (traffic sign recognition).

median: 64.466667, MAD: 13.238736
anomaly index: 3.652087
flagged label list: 33: 16.117647

Line #2 shows the final anomaly index is 3.652, which suggests the model is infected. Line #3 shows the outlier detection algorithm flags only 1 label (label 33), which has a trigger with L1 norm of 16.1.

backdoor's People

Contributors

bolunwang avatar shawn-shan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.