Giter Club home page Giter Club logo

fastseq's Introduction

FastSeq

Open in Visual Studio Code

Introduction

FastSeq provides efficient implementation of popular sequence models (e.g. Bart, ProphetNet) for text generation, summarization, translation tasks etc. It automatically optimizes inference speed based on popular NLP toolkits (e.g. FairSeq and HuggingFace-Transformers) without accuracy loss. All these can be easily done (no need to change any code/model/data if using our command line tool, or simply add one-line code import fastseq if using source code).

Features:

Speed Gain

Below shows the generation speed gain by using FastSeq.

Model W/O FastSeq (in samples/s) W/ FastSeq (in samples/s) Speedup
ProphetNet (fs) 2.8 11.9 4.3
Bart (fs) 3.3 25.1 7.7x
Bart (hf) 4.5 12.4 2.8x
DistilBart (hf) 5.5 19.1 3.5x
T5 (hf) 9.5 31.7 3.3x
WMT16 En-De (fs) 144.5 422.8 2.9x
GPT2 (hf) 0.9 7.1 7.9x
ProphetNet (hf) 3.4 6.2 1.8x
  • All benchmarking experiments run on NVIDIA-V100-16GB with docker. Highest speed recorded for each model by tuning batch size. For parameter setting details, click link of corresponding model.
  • The baseline (W/O Fastseq) for ProphetNet (fs) is run with fairseq 0.9.0, as it has not yet been updated for compatibility with version 0.10.2
  • fs stands for Fairseq 0.10.2 version, hf stands for Huggingface Transformers 4.12.0 version.
  • Optimizations were automatically applied to all generation/sequence models in Fairseq & Huggingface Transformers. Above only lists a subset of them.

How it works?

FastSeq develops multiple speedup techniques, including an attention cache optimization, an efficient algorithm for detecting repeated n-grams, and an asynchronous generation pipeline with parallel I/O. These optimizations support various Transformer-based model architectures, such as the encoder-decoder architecture, the decoder-only architecture, and the encoder-only architecture. The more efficient implementations in FastSeq will be automatically patched to replace the ones in existing NLP toolkits (e.g., HuggingFace-Transformers and FairSeq), so there is no need of big code changes to integrate FastSeq with these toolkits.

Installation

Requirements

If you use fairseq or transformers, you only need to install one of them. If you use both, you need to install both.

Building the Dockerfile

The dockerfile requires the specification of a base image.

cd fastseq/docker
# pass the base image name as a build-arg when building the image from the dockerfile
docker build --build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch:20.03-py3 .

Install from the source

# when fairseq and/or transformers has been installed
$ pip install git+https://github.com/microsoft/fastseq.git

# install fastseq + transformers
$ pip install git+https://github.com/microsoft/fastseq.git#egg=fastseq[transformers]

# install fastseq + fairseq
$ pip install git+https://github.com/microsoft/fastseq.git#egg=fastseq[fairseq]

# install fastseq + transformers + fairseq
$ pip install git+https://github.com/microsoft/fastseq.git#egg=fastseq[transformers,fairseq]

Usage

Use source code for speedup

Only one line of code change is needed to use the optimizations provided by FastSeq.

# import fastseq at the beginning of your program
import fastseq
import torch

# Download bart.large.cnn
bart = torch.hub.load('pytorch/fairseq', 'bart.large.cnn')

bart.cuda()  # use GPU
bart.eval()  # disable dropout for evaluation
bart.half()

slines = ['FastSeq provides efficient implementations of the popular sequence models. Please visit https://github.com/microsoft/fastseq for more details.']

hypotheses = bart.sample(
    slines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)

print(hypotheses)

Use command line tool to speedup fairseq models

Example usage for bart model on cnn daily mail task.

$ fastseq-generate-for-fairseq \
    cnn_dnn/bin \
    --path bart.large.cnn/model.pt \
    --fp16 \
    --task translation \
    --batch-size 128 \
    --gen-subset valid \
    --truncate-source  \
    --bpe gpt2 \
    --beam 4 \
    --num-workers 4 \
    --min-len 55 \
    --max-len-b 140 \
    --no-repeat-ngram-size 3 \
    --lenpen 2.0

Both model file and task data file are the same as original Fairseq version.

Use command line tool to speedup transformers models

Example usage for bart model on cnn daily mail task.

$ fastseq-generate-for-transformers \
    facebook/bart-large-cnn \
    cnn_dm/val.source \
    out.summary \
    --reference_path cnn_dm/val.target \
    --device cuda \
    --bs 128 \
    --fp16 \
    --score_path out.score \
    --task summarization

Both model file and task data file are the same as original Transformers version.

Run tests

# run a single test.
$ python tests/optimizer/fairseq/test_fairseq_optimizer.py

# run all the tests.
$ python -m unittest discover -s tests/ -p '*.py'

# run all the benchmarks.
$ cd benchmarks && bash run_all_benchmarks.sh

Code Style

Python coding style

Changes to Python code should conform to PEP 8. yapf can be used to help format the python code, and use pylint to check your Python changes.

# format the code by yapf
$ yapf --style pep8 -i -r PYTHON_FILE/PACKAGE

# run pylint check
$ pylint --rcfile=.pylintrc  PYTHON_FILE/PACKAGE

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Citation

Please cite as:

@inproceedings{yan-etal-2021-fastseq,
    title = "{F}ast{S}eq: Make Sequence Generation Faster",
    author = "Yan, Yu and Hu, Fei and Chen, Jiusheng and Bhendawade, Nikhil and Ye, Ting and Gong, Yeyun  and Duan, Nan  and Cui, Desheng  and Chi, Bingyu and Zhang, Ruofei",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations",
    year = "2021",
}


@InProceedings{pmlr-v139-yan21a,
  title = 	 {EL-Attention: Memory Efficient Lossless Attention for Generation},
  author =       {Yan, Yu and Chen, Jiusheng and Qi, Weizhen and Bhendawade, Nikhil and Gong, Yeyun and Duan, Nan and Zhang, Ruofei},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11648--11658},
  year = 	 {2021},
}

fastseq's People

Contributors

cep21 avatar feihugis avatar fuliucansheng avatar jiushengchen avatar julianneknott avatar microsoftopensource avatar monologg avatar nicknickgo avatar yuyan2do avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastseq's Issues

fairseq/transformers unit test modify local environment

after run tests/run_fairseq_tests.py, user's original Fairseq installation is deleted, and replaced.

Before run
Location: /opt/conda/lib/python3.6/site-packages

After run:
Location: /tmp/fairseq

Version could also be changed.
And same problem for Transformers.

It breaks user's environment. User needs to reinstall the package.

Can we isolate unit test pip environment from user's local environment? something like virtual environment, conda.

fairseq eval_lm

Have you guys looked at fairseq eval_lm?

It's not generative, but I was wondering whether any of these tricks would work.
Thanks in advance!

T5 speed

T5 speed on latest code is lower than expected (docker used). Caused benchmark test failed.
$CUDA_VISIBLE_DEVICES=3 bash models/hf_t5.sh

Util Model Task Split BatchSize Bleu Throughput(samples/s) Expected
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 27.44 5 5~5.5
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 27.43 6.3 7~7.5
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 27.42 7.2 7.9~8.4

Not sure if it is due to docker.

memory not-release issue for large BS on FastSeq

I'd like to report a memory not-release issue for large BS on FastSeq.

Impact:
I can re-produce it every time. As it does not release memory after crash, I am afraid if releasing this package to users, they may experience the same issue and are not easy to handle it.

How to reproduce:
I tested in gpu0 machine.
Below are the detailed steps of re-producing this issue:

  • Docker run image:
    sudo docker run --gpus all --privileged --name fastseq_dev_py3_tiy -it adsbrainwestus2.azurecr.io/fastseq:dev-py3 /bin/bash

  • Inside the container:

  1. Create RSA-key, add it to github account (just make it easy to download code)
  2. mkdir tiy & cd tiy
  3. Install the latest fastseq:
    git clone [email protected]:microsoft/fastseq.git
    cd fastseq
    pip install --editable ./
  4. cd benchmarks
    Set LOOP in utils.sh to be 1
  5. Run nvidia-smi the first time, no memory occupation, which is expected:

image

  1. Run ./benchmark.sh fairseq+fastseq bart.large.cnn cnn_dm/len-1024.bin valid 256
    Failed because of Bus error:
    Processing Loop=1/1 Util=fairseq_v0.9.0+fastseq_v0.0.3 Model=bart.large.cnn Task=cnn_dm/len-1024.bin Split=valid BS=256
    benchmark_seq.sh: line 55: 533 Bus error (core dumped) $util $data_dir --path $model_path --fp16 --task translation --batch-size $bs --gen-subset $split --truncate-source --bpe gpt2 --beam 4 --num-workers 4 --min-len 55 --max-len-b 140 --no-repeat-ngram-size 3 --lenpen 2.0 #--print-alignment #--print-step # KeyError: steps --skip-invalid-size-inputs-valid-test $* > $STDOUT_FILE 2> $STDERR_FILE
    Failed at benchmark_seq.sh (line 80): $util $data_dir --path $model_path --fp16 --task translation --batch-size $bs --gen-subset $split --truncate-source --bpe gpt2 --beam 4 --num-workers 4 --min-len 55 --max-len-b 140 --no-repeat-ngram-size 3 --lenpen 2.0 #--print-alignment #--print-step # KeyError: steps --skip-invalid-size-inputs-valid-test $* > $STDOUT_FILE 2> $STDERR_FILE

  2. Run nvidia-smi the second time, memory occupation on GPU0:

image

Other information:
I re-run 5 times to check if there is any information in fastseq.stderr. Most of time, there is no any error msg in fastseq.stderr.

  • 4 times, no any error message in fastseq.stderr

root@6e86574394fb:/workspace/tiy/fastseq/benchmarks# cat /tmp/fastseq.stderr
/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py:102: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py:102: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn("torch.distributed.reduce_op is deprecated, please use "

  • 1 time, there was EOFError recorded in fastseq.stderr

_root@6e86574394fb:/workspace/tiy/fastseq/benchmarks# cat /tmp/fastseq.stderr
/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py:102: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py:102: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn("torch.distributed.reduce_op is deprecated, please use "
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/multiprocessing/resource_sharer.py", line 142, in _serve
with self._listener.accept() as conn:
File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 456, in accept
answer_challenge(c, self._authkey)
File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 383, in recv
raise EOFError
EOFError

Any end-end inference example with Google Colab & HuggingFace

Hi Team,

Thanks a lot for this.

Few questions -

  1. Is the speedup only for GPU or the inference on CPU is also boosted?

  2. Wondering if an inference example with T5/BART summarization from Huggingface etc can be provided in a colab notebook or so. Easier to adopt.

Sorry if it is a bit of a stretch to request this. Appreciate you reading this.

Transformers unit tests failure

test_modeling_t5.py fails when fastseq is imported.

Steps to replicate :

  1. clone from test_wrapper branch . cd to tests directory.
    2.CUDA_VISIBLE_DEVICES=<> bash run_transformers_tests.py

ModuleNotFoundError: No module named 'fastseq.models'

Hello,

The fastseq library, installed through pip, seems to have the models directory missing and crashes with the following exception:

File "/root/miniconda3/envs/Env37/lib/python3.7/site-packages/fastseq/init.py", line 9, in
import fastseq.models # pylint: disable=wrong-import-position
ModuleNotFoundError: No module named 'fastseq.models'

And while this directory can be found in the repository here, I can only find prophetnet, while I am looking for BART.
Could you give me a hint on how to install the according models?

Thanks and happy holidays :-)

RuntimeError: CUDA error: no kernel image is available for execution on the device

I am trying to use your repeat ngram extension, but when I switch GPUs (without rebuilding the extension) it breaks with RuntimeError: CUDA error: no kernel image is available for execution on the device. If I rerun: python setup.py build_ext --inplace it works again. Any clues how to build the extension so that it works on a different GPU (same cuda version, same python version, same torch) than where it was built?

Also, we're considering pulling some of these changes back into fairseq, if that's alright with you guys!

Running error with PyTorch 1.12.1

The following error occurs with PyTorch 1.12.1, but disappears with PyTorch 1.11.0

  File "/home/tangtianyi/transformers/src/transformers/models/encoder_decoder/modeling_encoder_decoder.py", line 26, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/home/tangtianyi/transformers/src/transformers/modeling_utils.py", line 41, in <module>
    from .generation_utils import GenerationMixin
  File "/home/tangtianyi/transformers/src/transformers/generation_utils.py", line 29, in <module>
    from .generation_logits_process import (
  File "/home/tangtianyi/transformers/src/transformers/generation_logits_process.py", line 25, in <module>
    import ngram_repeat_block_cuda
ImportError: /home/tangtianyi/miniconda3/lib/python3.8/site-packages/ngram_repeat_block_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tangtianyi/model/pretrained_models.py", line 31, in <module>
    from transformers import AutoConfig, AutoModelForCausalLM, AutoModelForSeq2SeqLM, EncoderDecoderModel
  File "<frozen importlib._bootstrap>", line 1039, in _handle_fromlist
  File "/home/tangtianyi/transformers/src/transformers/utils/import_utils.py", line 948, in __getattr__
    value = getattr(module, name)
  File "/home/tangtianyi/transformers/src/transformers/utils/import_utils.py", line 947, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/tangtianyi/transformers/src/transformers/utils/import_utils.py", line 959, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.encoder_decoder.modeling_encoder_decoder because of the following error (look up to see its traceback):
/home/tangtianyi/miniconda3/lib/python3.8/site-packages/ngram_repeat_block_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE

Compatible with torch-1.6.0

fastseq-generate works on torch-1.5.0. But when running fastseq-generate on torch-1.6.0, got the following error:

Traceback (most recent call last):
  File "/datadrive/jiuchen/src/git.fastseq/fastseq_cli/generate.py", line 14, in <module>
    cli_main()
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 199, in cli_main
    main(args)
  File "/datadrive/jiuchen/src/git.fastseq/fastseq/optimizer/fairseq/generate_v1.py", line 113, in main_v1
    prefix_tokens)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/tasks/fairseq_task.py", line 265, in inference_step
    return generator.generate(models, sample, prefix_tokens=prefix_tokens)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/sequence_generator.py", line 113, in generate
    return self._generate(model, sample, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/datadrive/jiuchen/src/git.fastseq/fastseq/optimizer/fairseq/beam_search_optimizer_v1.py", line 704, in _generate
    scores.view(bsz, beam_size, -1)[:, :, :step],
  File "/usr/local/lib/python3.6/dist-packages/fairseq/search.py", line 81, in step
    torch.div(self.indices_buf, vocab_size, out=self.beams_buf)
RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

Support for current fairseq 0.10.2

Supported version of fairseq is 0.9.0, from late 2019. Current version has some breaking API changes. Is it planned to update fastseq to support current fairseq?
Thanks!

Where to read EL-Attention source code for huggingface-transformers

We are very interested in your work and thank you for your work. We have read your paper"EL-Attention". The more comprehensive examples can be found here for huggingface-transformers, but the self-attention save the key and value, not only hidden_states. El-Attention proves that saving hidden_states can half of the memory.

Errors in test_fairseq_optimizer.py

After #27, we still see errors like below. And it prints to logger.error. should we make test fail instead?

cd to project root dir
$CUDA_VISIBLE_DEVICES=3 python -m unittest discover tests/
...
ERROR 2020-09-02 16:33:06,101 test_fairseq_optimizer.py:119]
Mohammad Javad Zarif is the Iranian foreign minister. He has been U.S. Secretary of State John Kerry 's opposite number in nuclear talks. Zarif received a hero 's welcome as he arrived in Iran on a sunny Friday morning. The feds investigated him over his alleged role in controlling the Alavi Foundation.
 v.s.
Mohammad Javad Zarif is the Iranian foreign minister. He has been John Kerry 's opposite number in securing a breakthrough in nuclear discussions. Zarif received a hero 's welcome as he arrived in Iran on a sunny Friday morning. But there are some facts about Zarif that are less well-known.
....
----------------------------------------------------------------------
Ran 9 tests in 50.048s

OK

Does it support model seq2seq with encoder, and decoder base on lstm, bi-lstm?

Hi,
I want to inference my model seq2seq with encoder and decoder base on lstm, and bi-lstm. And i find out your project, all model in readme for sample of improve performance are transformer model base. And i not see other architecture like lstm, or conv? Can you confirm what's type of model can improve performance ?
Thankyou.

ACTION REQUIRED: Microsoft needs this private repository to complete compliance info

There are open compliance tasks that need to be reviewed for your fastseq repo.

Action required: 4 compliance tasks

To bring this repository to the standard required for 2021, we require administrators of this and all Microsoft GitHub repositories to complete a small set of tasks within the next 60 days. This is critical work to ensure the compliance and security of your microsoft GitHub organization.

Please take a few minutes to complete the tasks at: https://repos.opensource.microsoft.com/orgs/microsoft/repos/fastseq/compliance

  • The GitHub AE (GitHub inside Microsoft) migration survey has not been completed for this private repository
  • No Service Tree mapping has been set for this repo. If this team does not use Service Tree, they can also opt-out of providing Service Tree data in the Compliance tab.
  • No repository maintainers are set. The Open Source Maintainers are the decision-makers and actionable owners of the repository, irrespective of administrator permission grants on GitHub.
  • Classification of the repository as production/non-production is missing in the Compliance tab.

You can close this work item once you have completed the compliance tasks, or it will automatically close within a day of taking action.

If you no longer need this repository, it might be quickest to delete the repo, too.

GitHub inside Microsoft program information

More information about GitHub inside Microsoft and the new GitHub AE product can be found at https://aka.ms/gim or by contacting [email protected]

FYI: current admins at Microsoft include @ruofeizhang, @yetingqiaqia, @JiushengChen, @yuyan2do, @feihugis, @NickNickGo

Illegal memory access when batch_size is between (128, 256)

tests/optimiser/fairseq/test_fairseq_optimiser.py can work well when batch_size <= 128; However, when setting batch_size between [129, 255], the below error will be raised:

Traceback (most recent call last):
  File "/home/fhu/py-env/nlp/lib/python3.7/site-packages/absl/testing/parameterized.py", line 263, in bound_param_test
    test_method(self, **testcase_params)
  File "tests/optimiser/fairseq/test_fairseq_optimiser.py", line 101, in test_beam_search_optimiser
    no_repeat_ngram_size=no_repeat_ngram_size)
  File "/home/fhu/github/fairseq/fairseq/models/bart/hub_interface.py", line 107, in sample
    hypos = self.generate(input, beam, verbose, **kwargs)
  File "/home/fhu/github/fairseq/fairseq/models/bart/hub_interface.py", line 123, in generate
    prefix_tokens=sample['net_input']['src_tokens'].new_zeros((len(tokens), 1)).fill_(self.task.source_dictionary.bos()),
  File "/home/fhu/github/fairseq/fairseq/tasks/fairseq_task.py", line 361, in inference_step
    return generator.generate(models, sample, prefix_tokens=prefix_tokens)
  File "/home/fhu/py-env/nlp/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/home/fhu/github/fairseq/fairseq/sequence_generator.py", line 159, in generate
    return self._generate(sample, **kwargs)
  File "/home/fhu/github/fairseq/fairseq/sequence_generator.py", line 198, in _generate
    encoder_outs = self.model.forward_encoder(net_input)
  File "/home/fhu/github/fairseq/fairseq/sequence_generator.py", line 697, in forward_encoder
    for model in self.models
  File "/home/fhu/github/fairseq/fairseq/sequence_generator.py", line 697, in <listcomp>
    for model in self.models
  File "/home/fhu/github/fairseq/fairseq/models/fairseq_encoder.py", line 53, in forward_torchscript
    return self.forward_non_torchscript(net_input)
  File "/home/fhu/github/fairseq/fairseq/models/fairseq_encoder.py", line 62, in forward_non_torchscript
    return self.forward(**encoder_input)
  File "/home/fhu/github/fairseq/fairseq/models/transformer.py", line 411, in forward
    x = layer(x, encoder_padding_mask)
  File "/home/fhu/py-env/nlp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fhu/github/fairseq/fairseq/modules/transformer_layer.py", line 122, in forward
    attn_mask=attn_mask,
  File "/home/fhu/py-env/nlp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fhu/github/fastseq/fastseq/optimiser/fairseq/beam_search_optimiser_v2.py", line 200, in forward
    v_proj_weight=self.v_proj.weight,
  File "/home/fhu/py-env/nlp/lib/python3.7/site-packages/torch/nn/functional.py", line 3937, in multi_head_attention_forward
    float('-inf'),
RuntimeError: CUDA error: an illegal memory access was encountered

Support for HF's transformers 3.1+

Thanks for an excellent library.
Any plan to support the new model APIs that happened in transformers==3.1.0 and beyond?
Would it require major work or are there pointers on how to adapt it?

Thank you!

NMT models speedup abnormally related to batch size

Hi, Thanks for the great work. I just tested the fairseq-generate in my test set(ZH-EN translation) using the FastSeq and Fairseq, and the speedup is quiet abnormal comparing with the example link.
My test set has 1526 sentences with 5~150 Chinese characters each, and my experiment is on NVIDIA Tesla T4. The translation model I used is base transformer arch in fairseq, with encoder layer nums equals to 30.
I tested with following command:
for fairseq, fairseq-generate ../data-bin --path model_avg.pt --remove-bpe --batch-size 128
for fastseq, fastseq-generate-for-fairseq ../data-bin --path model_avg.pt --remove-bpe --batch-size 128 --postprocess-workers 5
I didn't use the --no-repeat-ngram-size in fastseq, and the beam size is default 5, lenpen is 1.
My test result is as follows:

BatchSize not assigned 128 10 5 1
fairseq-0.10.2 65.79 sentences/s 63.18 sentences/s 19.06 sentences/s 11.79 sentences/s 3.06 sentences/s
above + fastseq 75.55 sentences/s 74.28 sentences/s 17.38 sentences/s 11.47 sentences/s 2.92 sentences/s

I found when the batch size is large(such as 128 and above), the fastseq has obvious speedup(but not as much as 2x or above), but when the batch size is small( I test this because of my need for model used in actual situation for deployment), the fastseq seems like behaving no speedup at all, and even slower. I think the phenomenon quiet abnormal and ask for your help. Looking for your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.