The transformer-simplicity from satwik77

Simplicity Bias in Transformers

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer's effective generalization performance despite relatively limited expressiveness.

Dependencies

compatible with python 3
dependencies can be installed using Transformer-Simplicity/requirements.txt

Setup

Install VirtualEnv using the following (optional):

$ [sudo] pip install virtualenv

Create and activate your virtual environment (optional):

$ virtualenv -p python3 venv
$ source venv/bin/activate

Install all the required packages:

at Transformer-Simplicity/:

$ pip install -r requirements.txt

Models

The current repository includes 4 directories implementing different models and settings:

Training Transformer on Boolean functions : Transformer-Simplicity/FLTAtt
Training LSTMs on Boolean functions : Transformer-Simplicity/FLTClassifier
Experiments with Random Transformer : Transformer-Simplicity/RandFLTAtt
Experiments with Random LSTM : Transformer-Simplicity/RandFLTClassifier

Usage

The set of command line arguments available can be seen in the respective args.py file. Here, we illustrate running the experiment for training Transformers on sparse parities. Follow the same methodology for running any experiments with LSTMs.

At Transformer-Simplicity/FLTAtt:

$	python -m src.main -mode train -gpu 0 -dataset sparity40_5k -run_name trafo_sparity_40_5k -depth 4 -lr 0.001

To compute sensitivity of randomly initialized Transformers,
At Transformer-Simplicity/RandFLTAtt:

$	python rand_sensi.py -gpu 0 -sample_size 1000 -len 20 -trials 100

Citation

If you use our data or code, please cite our work:

@inproceedings{bhattamishra-etal-2023-simplicity,
    title = "Simplicity Bias in Transformers and their Ability to Learn Sparse {B}oolean Functions",
    author = "Bhattamishra, Satwik  and
      Patel, Arkil  and
      Kanade, Varun  and
      Blunsom, Phil",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.317",
    pages = "5767--5791",
    abstract = "Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer{'}s effective generalization performance despite relatively limited expressiveness.",
}

For any clarification, comments, or suggestions please contact Satwik or Arkil.

Some clarifications regarding results on different datasets

Hi there, I've tried to replicate some of the results for some datasets but couldn't get good results.
For instance on the dataset sparity40_5k (also tried sparity40_25h and sparse_parity4a) as you can see below training for 1500 epochs doesn't get beyond 52% validation accuracy.

Do you have any suggestions?
On a side note I've noticed that for most datasets the validation dataset has way more samples than the training dataset, why is that? Shouldn't the training set have more samples and the validation set a smaller fraction?

For instance, the sparity40_5k has 5k samples in the train.pkl and 10k in the dev.pkl, similar for sparity40_25h with 2.5k samples in train.pkl and 10k in dev.pkl

{
    "trafo_sparity_40_5k": {
        "run_name": "trafo_sparity_40_5k",
        "val_score": 0.5214000000000001,
        "train_acc": 0.9912000000000001,
        "best_epoch": 929,
        "dataset": "sparity40_5k",
        "heads": 4,
        "d_model": 64,
        "depth": 4,
        "dropout": 0.1,
        "lr": 0.001,
        "batch_size": 500,
        "epochs": 1500,
        "opt": "adam"
    }
}

satwik77 / transformer-simplicity Goto Github PK

transformer-simplicity's Introduction

Simplicity Bias in Transformers

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Dependencies

Setup

Models

Usage

Citation

transformer-simplicity's People

Contributors

Stargazers

Watchers

Forkers

transformer-simplicity's Issues

Some clarifications regarding results on different datasets

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent