Giter Club home page Giter Club logo

model-vs-human's Introduction

header

BenchmarkInstallationUser experienceModel zooDatasetsCredit & citation

modelvshuman: Does your model generalise better than humans?

modelvshuman is a Python toolbox to benchmark the gap between human and machine vision. Using this library, both PyTorch and TensorFlow models can be evaluated on 17 out-of-distribution datasets with high-quality human comparison data.

🏆 Benchmark

The top-10 models are listed here; training dataset size is indicated in brackets. Additionally, standard ResNet-50 is included as the last entry of the table for comparison. Model ranks are calculated across the full range of 52 models that we tested. If your model scores better than some (or even all) of the models here, please open a pull request and we'll be happy to include it here!

Most human-like behaviour

winner model accuracy difference ↓ observed consistency ↑ error consistency ↑ mean rank ↓
🥇 ViT-22B-384: ViT-22B (4B) .018 .783 .258 1.67
🥈 CLIP: ViT-B (400M) .023 .758 .281 3
🥉 ViT-22B-560: ViT-22B (4B) .022 .739 .281 3.33
👏 SWSL: ResNeXt-101 (940M) .028 .752 .237 6
👏 BiT-M: ResNet-101x1 (14M) .034 .733 .252 7
👏 BiT-M: ResNet-152x2 (14M) .035 .737 .243 7.67
👏 ViT-L (1M) .033 .738 .222 9.33
👏 BiT-M: ResNet-152x4 (14M) .035 .732 .233 10.33
👏 BiT-M: ResNet-50x3 (14M) .040 .726 .228 12
👏 ViT-L (14M) .035 .744 .206 12
... standard ResNet-50 (1M) .087 .665 .208 31.33

Highest OOD (out-of-distribution) distortion robustness

winner model OOD accuracy ↑ rank ↓
🥇 ViT-22B-224: ViT-22B (4B) .837 1
🥈 Noisy Student: EfficientNet-L2 (300M) .829 2
🥉 ViT-22B-384: ViT-22B (4B) .798 3
👏 ViT-L (14M) .733 4
👏 CLIP: ViT-B (400M) .708 5
👏 ViT-L (1M) .706 6
👏 SWSL: ResNeXt-101 (940M) .698 7
👏 BiT-M: ResNet-152x2 (14M) .694 8
👏 BiT-M: ResNet-152x4 (14M) .688 9
👏 BiT-M: ResNet-101x3 (14M) .682 10
... standard ResNet-50 (1M) .559 34

🔧 Installation

Simply clone the repository to a location of your choice and follow these steps (requires python3.8):

  1. Set the repository home path by running the following from the command line:

    export MODELVSHUMANDIR=/absolute/path/to/this/repository/
    
  2. Within the cloned repository, install package:

    pip install -e .
    

    (The -e option makes sure that changes to the code are reflected in the package, which is important e.g. if you add your own model or make any other changes)

🔬 User experience

Simply edit examples/evaluate.py as desired. This will test a list of models on out-of-distribution datasets, generating plots. If you then compile latex-report/report.tex, all the plots will be included in one convenient PDF report.

🐫 Model zoo

The following models are currently implemented:

If you e.g. add/implement your own model, please make sure to compute the ImageNet accuracy as a sanity check.

How to load a model

If you just want to load a model from the model zoo, this is what you can do:

    # loading a PyTorch model from the zoo
    from modelvshuman.models.pytorch.model_zoo import InfoMin
    model = InfoMin("InfoMin")

    # loading a Tensorflow model from the zoo
    from modelvshuman.models.tensorflow.model_zoo import efficientnet_b0
    model = efficientnet_b0("efficientnet_b0")

Then, if you have a custom set of images that you want to evaluate the model on, load those (in the example below, called images) and evaluate via:

    output_numpy = model.forward_batch(images)
    
    # by default, type(output) is numpy.ndarray, which can be converted to a tensor via:
    output_tensor = torch.tensor(output_numpy)

However, if you simply want to run a model through the generalisation datasets provided by the toolbox, we recommend to check the section on User experience.

How to list all available models

All implemented models are registered by the model registry, which can then be used to list all available models of a certain framework with the following method:

    from modelvshuman import models
    
    print(models.list_models("pytorch"))
    print(models.list_models("tensorflow"))
How to add a new model

Adding a new model is possible for standard PyTorch and TensorFlow models. Depending on the framework (pytorch / tensorflow), open modelvshuman/models/<framework>/model_zoo.py. Here, you can add your own model with a few lines of code - similar to how you would load it usually. If your model has a custom model definition, create a new subdirectory called modelvshuman/models/<framework>/my_fancy_model/fancy_model.py which you can then import from model_zoo.py via from .my_fancy_model import fancy_model.

📁 Datasets

In total, 17 datasets with human comparison data collected under highly controlled laboratory conditions in the Wichmannlab are available.

Twelve datasets correspond to parametric or binary image distortions. Top row: colour/grayscale, contrast, high-pass, low-pass (blurring), phase noise, power equalisation. Bottom row: opponent colour, rotation, Eidolon I, II and III, uniform noise. noise-stimuli

The remaining five datasets correspond to the following nonparametric image manipulations: sketch, stylized, edge, silhouette, texture-shape cue conflict. nonparametric-stimuli

How to load a dataset

Similarly, if you're interested in just loading a dataset, you can do this via:

   from modelvshuman.datasets import sketch      
   dataset = sketch(batch_size=16, num_workers=4)

Note that the datasets aren't available after installing the toolbox just yet. Instead, they are automatically downloaded the first time a model is evaluated on the dataset (see examples/evaluate.py).

How to list all available datasets
    from modelvshuman import datasets
    
    print(list(datasets.list_datasets().keys()))
Download raw test images

If you'd like to download the test images yourself, they are availabel here.

💳 Credit

Psychophysical data were collected by us in the vision laboratory of the Wichmannlab.

That said, we used existing image dataset sources. 12 datasets were obtained from Generalisation in humans and deep neural networks. 4 datasets were obtained from ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. Additionally, we used 1 dataset from Learning Robust Global Representations by Penalizing Local Predictive Power (sketch images from ImageNet-Sketch).

We thank all model authors and repository maintainers for providing the models described above.

Citation

@inproceedings{geirhos2021partial,
  title={Partial success in closing the gap between human and machine vision},
  author={Geirhos, Robert and Narayanappa, Kantharaju and Mitzkus, Benjamin and Thieringer, Tizian and Bethge, Matthias and Wichmann, Felix A and Brendel, Wieland},
  booktitle={{Advances in Neural Information Processing Systems 34}},
  year={2021},
}

model-vs-human's People

Contributors

kantharajucn avatar rgeirhos avatar thoklei avatar yurigalindo avatar zehao-x avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

model-vs-human's Issues

NaN in .csv files

Hi @rgeirhos, Hope all is well

I've added a new model in model_zoo.py, it returns a resnet50 with a custom checkpoint.
Now in my CSVs I see some NaNs in the "rt" column, is it ok to have these NaN numbers?

subj,session,trial,rt,object_response,category,condition,imagename
resnet50d,1,1,NaN,clock,airplane,NaN,airplane1.png
resnet50d,1,2,NaN,knife,airplane,NaN,airplane10.png
resnet50d,1,3,NaN,clock,airplane,NaN,airplane2.png
resnet50d,1,4,NaN,knife,airplane,NaN,airplane3.png
resnet50d,1,5,NaN,clock,airplane,NaN,airplane4.png
...

I am wondering why do we have NaN in

"NaN", response[0], category,

thanks

Unable to reproduce results

I followed the guidelines in the README and ran evaluate.py in the example; But an error occurred, indicating that the model could not be found: bagnet33 and simclr_resnet50x1; When I commented out these two models in evaluate.py, after running them, only the latex report did not generate a PDF, and there were only two benchmarked, table, and accuracy. tex in the assert; benchmark_table_humanlike.tex

The environment I am using is PyTorch 1.11.0; Ubuntu 20.04.

We urgently need your help.

Question about self-supervised models

Hello, thanks for the great toolbox~

I'm little confusion about the results on self-supervised models, like SimCLR. It doesn't have the class-specifical classifier on ImageNet-1k.
So how do you load the classifier weight? load an extra-trained classifier (like linear-probing protocol) or a fully-trained network (like end-to-end fine-tuning protocol)?

And another question:
If I just want to test the shape & texture accuracy, mentioned in Intriguing Properties of Vision Transformers, which dataset type should I choose?

Thank you very much~

How to resolve issues related to system?

Hi,

Does this library only support Linux users? I have several path issues when trying to load models or datasets that are associated with Linux vs Windows. Do you know how to easily fix all at once? It is very difficult to manually address all.

Thanks

not able to reproduce the results

Firstly thanks for sharing the interesting paper and code!

I followed the installation steps but running python examples/evaluate.py (didn't edit anything) resulting the following strange error.
Could the authors give some insights on the reasons?

Plotting accuracy for dataset colour
The following model(s) were not found: alexnet
List of possible models in this dataset:
['bagnet33' 'resnet50' 'simclr_resnet50x1']
The following model(s) were not found: subject-*
List of possible models in this dataset:
['bagnet33' 'resnet50' 'simclr_resnet50x1']
Traceback (most recent call last):
File "examples/evaluate.py", line 28, in
run_plotting()
File "examples/evaluate.py", line 18, in run_plotting
figure_directory_name = figure_dirname)
File "/home/eric/model_vs_human/modelvshuman/plotting/plot.py", line 108, in plot
result_dir=result_dir)
File "/home/eric/model_vs_human/modelvshuman/plotting/plot.py", line 744, in plot_accuracy
result_dir=result_dir, plot_type="accuracy")
File "/home/eric/model_vs_human/modelvshuman/plotting/plot.py", line 772, in plot_general_analyses
experiment=e)
File "/home/eric/model_vs_human/modelvshuman/plotting/analyses.py", line 254, in get_result_df
r = self.analysis(subdat)
File "/home/eric/model_vs_human/modelvshuman/plotting/analyses.py", line 284, in analysis
self._check_dataframe(df)
File "/home/eric/model_vs_human/modelvshuman/plotting/analyses.py", line 24, in _check_dataframe
assert len(df) > 0, "empty dataframe"
AssertionError: empty dataframe

Feature request: Simpler loading of custom models

Using your toolbox with the built-in models is straightforward, but we would like to compare some custom pytorch models.
It would be great to have a routine to add these models (i.e. subclasses of nn.Module) to the toolbox registry from your own script. If this is already possible, it would be great if you could share an example.

Currently, we add the model inside the toolbox's files which makes extensions complicated and redundant (e.g. name of
model in the path, the function name, the plotting routine).

Thanks
David

Feature request: Machine readable results, e.g. CSV

Currently, the toolbox saves Latex tables and plots with the resulting accuracies and error consistencies.
For custom plotting routines and similar, it would be great to have these numbers additionally in an easy-to-parse format, e.g. CSV or JSON.

Question about shape bias

Could you please provide the formula to get the shape bias? Currently, I can successfully run this repo with my own model, but I am confused about how the shape bias is abtained. I will be very gratefully if you can elaborate it. Thanks!

dataset "original"

from what I can see most distortions include undistorted images in their evaluation so that the automatically generated plots work without a seperate scoring run on the original dataset. Is there any analysis for which I need to specifically obtain the "original" dataset?

dataset path?

Hi --- first thank you a lot for this useful dataset and API!

I'm trying to load the datasets using the following code, but got an error msg saying dataset sketch path not found: model-vs-human/datasets/sketch/dnn/

Does this mean that I should download the sketch image dataset myself from the original source and put under this path? Little more documentation on how the image dataset should be structured will be very useful!

from modelvshuman.datasets import sketch,      
dataset = sketch(batch_size=16, num_workers=4)

Thank you!

Question Regarding Human Raw Dataset

Hi! Thank you for providing a valuable dataset.

As I dig into the raw data which contains human annotations, some questions raised in my mind.
Below is how I analyzed the raw dataset in contrast experiment

For an image named 0580_cop_dnn_c05_bicycle_10_n03792782_10129.png, I extracted rows in which image_id matches 'c05_bicycle_10_n03792782_10129.png'.
Then, I got the following 4 rows. Here, human_annotation is a simple concatenation of all csv files in contrast experiment.
image

Below is the visualization of image 0580_cop_dnn_c05_bicycle_10_n03792782_10129.png
image

Although the image is not very clear, I do not fully agree with the human predictions. (They predicted 'clock', 'oven', 'bear', 'keyboard' -- It is very clear that the figure is not a keyboard.)
Is there anything wrong or I missed in the analysis?

Thanks,

Request for data to reproduce figures

Very interesting work and thanks very much for releasing it publicly. We are working on extending some of your results/studies, could you please provide more information related to Figure 2?

Figure 2 compares several models, but they are not all labeled. I am interested in finding out the accuracy-distortion tradeoff for each model. The figure shows this information at a coarse level, such as to compare all self-supervised models, adversarially trained models, etc. Would be perfect if you could provide the model-name (corresponding to your model zoo) and its corresponding performance at various distortion levels for the 12 distortion types that are considered.

Thanks again, looking forward to hearing from you!

Difficulties loading adversarially trained models

I didn't succeed in loading the adversarially trained models. run_evaluation results in the following error:

HTTP Error 403: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.

Any idea how to fix this?

BiT models via timm?

had difficulties obtaining the BiT models via pytorch image models. I then used:

e.g.,

import timm m = timm.create_model('resnetv2_152x12_bitm', pretrained=True)

in pytorch model_zoo.py.

This worked perfectly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.