Giter Club home page Giter Club logo

topofilter's Introduction

A Topological Filter for Learning with Label Noise (NeurIPS 2020, Paper)

Requirements

  • PyTorch 0.4.1 (have not tested on other versions)
  • Python 3.6 (for the purpose of compiling C++ code. Other 3.x versions should also work.)
  • scipy 1.1.0 (this is due to the computation of distribution mode)
  • termcolor, etc (which can be easily installed with pip)

Usage

  • Compile the C++ code for computing the connected components. In folder ref, run ./compile_pers_lib.sh (by default it requires Python 3.6. If you are using other Python versions, modify the command inside compile_pers_lib.sh).
  • Run train.py with the commands like below:
python train.py --every 5 --start_clean 30 --k_cc 4 --k_outlier 32 --seed 77 --type uniform --noise 0.4 --patience 65 --gpus 0 --dataset cifar10 --zeta 0.5
  • For point cloud dataset, run the command with pc argument:
python train.py --gpus 2 --every 5 --start_clean 10 --k_outlier 30 --k_cc 100 --noise 0.8 --type uniform --patience 60 --seed 77 --dataset pc --net pc --milestone 35 --zeta 2

Here the major parameters are:

  • every: the frequency of data collection.
  • start_clean: when to start data collection.
  • k_cc: the parameter for computing the KNN graph when finding the largest connected component.
  • k_outlier: the parameter for computing the KNN graph when applying zeta filtering.
  • seed: the random seed.
  • type: the noise type. Options include uniform and asym.
  • noise: the noise level.
  • patience: this is a trick to save training time. If we observe no obvious improvement of validation accuracy for a consecutive number of N epochs, we stop the training.
  • gpus: run on which GPU.
  • dataset: which dataset to use. Options include cifar10, cifar100 and pc. For the pc dataset, it can be downloaded from https://github.com/charlesq34/pointnet
  • zeta: the parameter for zeta filtering. Note that, when setting zeta to be > 1.0, we will use majority voting to remove the outliers. This sometimes achieves better performance.

Practical tips: For the extrmely noisy scenarios (noise level >= 0.8), we observe setting a larger k_cc is better.

Our code will be further improved to make it cleaner and easier to use.

Reference:

@inproceedings{wu2020topological,
  title={A Topological Filter for Learning with Label Noise},
  author={Wu, Pengxiang and Zheng, Songzhu and Goswami, Mayank and Metaxas, Dimitris and Chen, Chao},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}

Related Works:

  • Error-Bounded Correction of Noisy Labels. In ICML, 2020. [Paper][Code]
  • Learning with Feature Dependent Label Noise: A Progressive Approach. In ICLR, 2021. [Paper][Code]

topofilter's People

Contributors

pxiangwu avatar quellazhang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.