Giter Club home page Giter Club logo

transformer-formal-languages's Introduction

Transformer-Formal-Languages

On the Ability and Limitations of Transformers to Recognize Formal Languages

Transformers have supplanted recurrent models in a large number of NLP tasks. However, our understanding of their practical ability to model different behaviors relevant to sequence modeling is still nascent. On the other hand, a long line of research has sought to understand the computational capabilities of recurrent models such as the LSTMs.

...

Recent work has demonstrated close connections between LSTMs and counter automata. In our work, we seek to understand the capabilities of Transformers to model languages for which the abilities of LSTMs are well understood. In the paper, we first show that Transformers are expressive enough to recognize a subclass of counter languages, including well-studied languages such as n-ary Boolean Expressions, Dyck-1, and its generalizations (Section 4). In experiments, we find that Transformers do well on this subclass, and their learned mechanism strongly correlates with our construction (Section 6).

...

Perhaps surprisingly, in contrast to LSTMs, Transformers do well only on a subset of regular languages with degrading performance as we make languages more complex according to a well-known measure of complexity (Section 7). Our results indicate that, in contrast to LSTMs, Transformers achieve limited performance on languages that involve modeling periodicity, modular counting, and even simpler star-free variants of Dyck-1, which they were able to recognize effortlessly. Our analysis also provides insights on the role of self-attention mechanism in modeling certain behaviors and the influence of positional encoding schemes on the learning and generalization abilities of the model.

Dependencies

  • Compatible with Python3.6 and Pytorch 1.2.0
  • The necessary packages can be install through requirements.txt.

Setup

First create the required directories and extract the dataset by running setup.sh

chmod a+x setup.sh
./setup.sh

Install VirtualEnv using the following (optional):

$ [sudo] pip install virtualenv

We recommend creating a virtual environment(optional):

$ virtualenv -p python3 venv
$ source venv/bin/activate

Finally, install the required packages by running:

pip install -r requirements.txt

Resources

We provide all the datasets that we mention in the paper which are:

  • Shuffle-2
  • TernaryBoolean
  • Counter-3: $a^nb^nc^n$
  • Parity
  • (aa)*
  • (abab)*
  • D-2: (a(ab)*b)*
  • D-4:
  • Tomita-{1-7}

Extract the provided data zip file, in this repo's directory, such that you see a data/ subdirectory here containing all the above mentioned datasets.

Training the model

For training the model with default hyperparameter settings, execute the following command:

python -m src.main -mode train -run_name testrun -dataset <DatasetName> \
-model_type <ARCHITECTURE> -gpu <GPU-ID>
  • run_name: A unique identifier for an experiment, the location for storing model checkpoints and logs are determined using this.
  • dataset: Which dataset to train and validate the model on, choose from the list of datasets mentioned above.
  • model_type: Which type of neural network architecture would you like to choose for running the experiments, choose from RNN (by default LSTM is used), SAN (Transformer encoder), SAN-Rel (Transformer with relative position encodings).
  • gpu: For a multi-gpu machnine, specify the id of the gpu where you wish to run the training process. In case of single gpu just put 0 to use the default gpu. Note that the currently the code won't run without a GPU, we will provide support for running it on a CPU shortly.

Other hypeparameters can be found in the file src/args.py. Some important hyperparameters that might be worth noting are given below:

  • pos_encode: Only applicable when model_type is SAN, adding -pos_encode in the training command described above, will initialize a transformer that uses absolute positional encodings. Without adding it, the model will not use any form of positional encoding.
  • d_model: Applicable for model_type SAN and SAN-Rel, and is the size of the intermediate vectors used in the network. Eg. usage -d_model 32
  • heads: Also applicable for SAN and SAN-Rel only, specifies the number of attention heads to use.
  • depth: Number of layers that you would like to initialize your network with.

Generating Other Datasets

Apart from the datasets mentioned above, the supplementary material contains a list on much more formal languages. Although, not all of those datasets have been provided by us, it is easy to generate them using our data generation scripts. For example if you want to generate data for Shuffle-4, you just need to run:

python generate_data.py -lang Shuffle -num_par 4 -dataset Shuffle-4 \
-lower_window 2 -upper_window  50 -training_size 10000 -test_size 2000 -bins 2 -len_incr 50

Here,

  • lang: refers to the class of language for which you would like to generate data, can be Shuffle, Dyck, Parity, Tomita etc.
  • num_par: To select a particular language from a family of languages. Specifying 4 here refers to generating Shuffle-4 particularly.
  • dataset: The directory name where the generated data will be saved. The same name must be used in the training script, if you wish to run training on the generated dataset.
  • lower_window: Minimum length of the strings in training dataset.
  • upper_window: Maximum length of the strings in training datasets.
  • training_size: Number of strings to generate for training data.
  • test_size: Number of strings to generate for test/validation data.
  • bins: Number of validation bins.
  • len_incr: width of each validation window.

Update: All the datasets used in the main paper as well as the appendix can be downloaded via this link

For any clarification, comments, or suggestions please contact Satwik or Kabir.

transformer-formal-languages's People

Contributors

kabirahuja2431 avatar satwik77 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.