Giter Club home page Giter Club logo

transformer-simplicity's Introduction

Simplicity Bias in Transformers

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer's effective generalization performance despite relatively limited expressiveness.

...

Dependencies

  • compatible with python 3
  • dependencies can be installed using Transformer-Simplicity/requirements.txt

Setup

Install VirtualEnv using the following (optional):

$ [sudo] pip install virtualenv

Create and activate your virtual environment (optional):

$ virtualenv -p python3 venv
$ source venv/bin/activate

Install all the required packages:

at Transformer-Simplicity/:

$ pip install -r requirements.txt

Models

The current repository includes 4 directories implementing different models and settings:

  • Training Transformer on Boolean functions : Transformer-Simplicity/FLTAtt
  • Training LSTMs on Boolean functions : Transformer-Simplicity/FLTClassifier
  • Experiments with Random Transformer : Transformer-Simplicity/RandFLTAtt
  • Experiments with Random LSTM : Transformer-Simplicity/RandFLTClassifier

Usage

The set of command line arguments available can be seen in the respective args.py file. Here, we illustrate running the experiment for training Transformers on sparse parities. Follow the same methodology for running any experiments with LSTMs.

At Transformer-Simplicity/FLTAtt:

$	python -m src.main -mode train -gpu 0 -dataset sparity40_5k -run_name trafo_sparity_40_5k -depth 4 -lr 0.001

To compute sensitivity of randomly initialized Transformers,
At Transformer-Simplicity/RandFLTAtt:

$	python rand_sensi.py -gpu 0 -sample_size 1000 -len 20 -trials 100

Citation

If you use our data or code, please cite our work:

@inproceedings{bhattamishra-etal-2023-simplicity,
    title = "Simplicity Bias in Transformers and their Ability to Learn Sparse {B}oolean Functions",
    author = "Bhattamishra, Satwik  and
      Patel, Arkil  and
      Kanade, Varun  and
      Blunsom, Phil",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.317",
    pages = "5767--5791",
    abstract = "Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer{'}s effective generalization performance despite relatively limited expressiveness.",
}

For any clarification, comments, or suggestions please contact Satwik or Arkil.

transformer-simplicity's People

Contributors

satwik77 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

arkilpatel

transformer-simplicity's Issues

Some clarifications regarding results on different datasets

Hi there, I've tried to replicate some of the results for some datasets but couldn't get good results.
For instance on the dataset sparity40_5k (also tried sparity40_25h and sparse_parity4a) as you can see below training for 1500 epochs doesn't get beyond 52% validation accuracy.

Do you have any suggestions?
On a side note I've noticed that for most datasets the validation dataset has way more samples than the training dataset, why is that? Shouldn't the training set have more samples and the validation set a smaller fraction?

For instance, the sparity40_5k has 5k samples in the train.pkl and 10k in the dev.pkl, similar for sparity40_25h with 2.5k samples in train.pkl and 10k in dev.pkl

{
    "trafo_sparity_40_5k": {
        "run_name": "trafo_sparity_40_5k",
        "val_score": 0.5214000000000001,
        "train_acc": 0.9912000000000001,
        "best_epoch": 929,
        "dataset": "sparity40_5k",
        "heads": 4,
        "d_model": 64,
        "depth": 4,
        "dropout": 0.1,
        "lr": 0.001,
        "batch_size": 500,
        "epochs": 1500,
        "opt": "adam"
    }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.