thu-coai / da-transformer Goto Github PK

Official Implementation for the ICML2022 paper "Directed Acyclic Transformer for Non-Autoregressive Machine Translation"

License: Other

Python 96.31% C++ 0.64% Cuda 2.50% Cython 0.37% Shell 0.06% Lua 0.12% C 0.01%

machine-translation natural-language-generation pytorch text-generation

da-transformer's People

Contributors

Stargazers

Watchers

Forkers

chenxinan-fdu jus1mple shadowkun techthiyanes huguanglong shaochenze baoy-nlp ictnlp mythwn apollohuang1 happyers

da-transformer's Issues

Running process stopped at “compiling cuda operations”

Hello! I successfully run the code. However, when the running process reaches this step, it stops and does not continue without any error. Do you have any advice or opinion about this problem?

2022-10-18 17:05:27 | INFO | fairseq.utils | ***********************CUDA enviroments for all 4 workers***********************
2022-10-18 17:05:27 | INFO | fairseq_cli.train | training on 4 devices (GPUs/TPUs)
2022-10-18 17:05:27 | INFO | fairseq_cli.train | max tokens per device = 2048 and max sentences per device = None
2022-10-18 17:05:27 | INFO | fairseq.trainer | Preparing to load checkpoint ./model/checkpoint_last.pt
2022-10-18 17:05:27 | INFO | fairseq.trainer | No existing checkpoint found ./model/checkpoint_last.pt
2022-10-18 17:05:27 | INFO | fairseq.trainer | loading train data for epoch 1
2022-10-18 17:05:28 | INFO | fairseq.data.data_utils | loaded 4,500,966 examples from: ./bin_data/WMT16/train.en-de.en
2022-10-18 17:05:28 | INFO | fairseq.data.data_utils | loaded 4,500,966 examples from: ./bin_data/WMT16/train.en-de.de
2022-10-18 17:05:28 | INFO | fairseq.tasks.translation | ./bin_data/WMT16 train en-de 4500966 examples
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:35 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1278
2022-10-18 17:05:35 | INFO | fairseq.trainer | begin training epoch 1
2022-10-18 17:05:35 | INFO | fairseq_cli.train | Start iterating over samples
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)

The speedup of using the cuda operation compared with PyTorch native operations.

Thanks for your good job. I wonder the speedup of using the cuda operation compared with PyTorch native operations. And is there a good tutorial to start the cuda programming.

Would you like to share distilled datasets ?

Hi,
Thanks for your nice paper and code!
Would you like to share the distilled datasets used in this paper?

dag_best_alignment: graph size is too small

Hello, this is great work in NAT and I like it. I tried to modify the src-upsample-scale and make the lambda smaller, like 2 or 4. But it is raise an error: "dag_best_alignment.cu:68: calculate_maxalpha_kernel: block: [0,77,0], thread: [0,244,0] Assertion output_len >= target_len && "dag_best_alignment: graph size is too small (smaller than target length)" failed."
Do you know how to fix this error? Thank you

Can not reproduce the result when factor=4

Hello, I tried to reproduce the situation where factor=4 and used Lookahead decoding, but get the result 25.64 BLEU which is lower than the reported one 26.14 BLEU Score on WMT'14 EN-DE raw data in the paper. I use the same environment, same training script, same decoding script and the same dataset but still fail. Can you help me? Or can you share the checkpoints on WMT14 EN-DE raw data and distilled data?

My Training Script

fairseq-train ${data_dir}  \
    \
    `# loading DA-Transformer plugins` \
    --user-dir fs_plugins \
    \
    `# DA-Transformer Task Configs` \
    --task translation_dat_task \
    --upsample-base source --upsample-scale 4 \
    --filter-max-length 128:1024 --filter-ratio 2 \
    --skip-invalid-size-inputs-valid-test \
    \
    `# DA-Transformer Architecture Configs` \
    --arch glat_decomposed_link_base \
    --links-feature feature:position \
    --max-source-positions 128 --max-target-positions 1024 \
    --encoder-learned-pos --decoder-learned-pos \
    --share-all-embeddings --activation-fn gelu --apply-bert-init \
    \
    `# DA-Transformer Decoding Configs (See more in the decoding section)` \
    --decode-strategy lookahead --decode-upsample-scale 4.0 \
    \
    `# DA-Transformer Criterion Configs` \
    --criterion nat_dag_loss \
    --length-loss-factor 0 --max-transition-length 99999 \
    --glat-p 0.5:0.1@200k --glance-strategy number-random \
    --no-force-emit \
    \
    `# Optimizer & Regularizer Configs` \
    --optimizer adam --adam-betas '(0.9,0.999)' --fp16 \
    --label-smoothing 0.0 --weight-decay 0.01 --dropout 0.1 \
    --lr-scheduler inverse_sqrt  --warmup-updates 10000   \
    --clip-norm 0.1 --lr 0.0005 --warmup-init-lr '1e-07' --stop-min-lr '1e-09' \
    \
    `# Training Configs` \
    --max-tokens 32392  --max-tokens-valid 4096 --update-freq 1 \
    --max-update 300000  --grouped-shuffling \
    --max-encoder-batch-tokens 8000 --max-decoder-batch-tokens 34000 \
    --seed 0 --ddp-backend c10d --required-batch-size-multiple 1 \
    \
    `# Validation Configs` \
    --valid-subset valid \
    --validate-interval 1       --validate-interval-updates 10000 \
    --eval-bleu --eval-bleu-detok space --eval-bleu-remove-bpe --eval-bleu-print-samples --eval-tokenized-bleu \
    --fixed-validation-seed 7 \
    \
    `# Checkpoint Configs` \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
    --save-interval 1  --save-interval-updates 10000 \
    --keep-best-checkpoints 5 --save-dir ${checkpoint_dir} \
    \
    `# Logging Configs` \
    --log-format 'simple' --log-interval 100

My Decoding Script

average_checkpoint_path=${checkpoint_dir}/average.pt

python3 ./fs_plugins/scripts/average_checkpoints.py \
  --inputs ${checkpoint_dir} \
  --max-metric \
  --best-checkpoints-metric bleu \
  --num-best-checkpoints-metric 5 \
  --output ${average_checkpoint_path}

fairseq-generate ${data_dir} \
    --gen-subset test --user-dir fs_plugins --task translation_dat_task \
    --remove-bpe --max-tokens 4096 --seed 0 \
    --decode-strategy lookahead --decode-upsample-scale 4 --decode-beta 1  \
    --path ${average_checkpoint_path}

Pretrained model

Hi, do you have pretrained models to share with us?

Compiled Failed

python 3.7.12
pytorch 1.11.0+cu102
gcc 5.4

I have modified the cloneable.h file according to the FAQs section, but I still encounter the following error when the program is running. Please tell me how can i fix it?

 
Traceback (most recent call last):  
File /home/env/nat/lib/python3.7/site-packages/torch/utils/cpp_extension.py, line 1746, in _run_ninja_build   env=env)
File /home/env/nat/lib/python3.7/subprocess.py, line 512, in run   output=stdout, stderr=stderr)  subprocess.CalledProcessError: Command [ninja, -v] returned non-zero exit status 1.
The above exception was the direct cause of the following exception:

RuntimeError: Error building extension 'dag_loss_fn': [1/2] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=dag_loss_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/env/nat/lib/python3.7/site-packages/torch/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/TH -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/env/nat/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -DOF_SOFTMAX_USE_FAST_MATH -std=c++14 -c /home/DA-Transformer/fs_plugins/custom_ops/logsoftmax_gather.cu -o logsoftmax_gather.cuda.o 
FAILED: logsoftmax_gather.cuda.o 

/usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=dag_loss_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/env/nat/lib/python3.7/site-packages/torch/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/TH -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/env/nat/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -DOF_SOFTMAX_USE_FAST_MATH -std=c++14 -c /home/DA-Transformer/fs_plugins/custom_ops/logsoftmax_gather.cu -o logsoftmax_gather.cuda.o 
/home/DA-Transformer/fs_plugins/custom_ops/logsoftmax_gather.cu:31:23: fatal error: cub/cub.cuh: No such file or directory compilation terminated.
ninja: build stopped: subcommand failed

Want to get the top 10 translation results

I want to get the best 10 translations through the search algorithm, but I don't know how to do it.

model miniaturization

Hi, I tried to train a miniaturized model with 6-layer encoder 3-layer decoder and 256 hidden dims, but found that the accuracy of the model declines rapidly. Is there any suggestion for model miniaturization? Thanks.

training config

To replicate and build upon your results, it is crucial for me to have a comprehensive understanding of the training configuration employed during the experiments. Is the examples/DA-Transformer/wmt14_ende.sh the config used to get the results in your paper. I found it impossible to finished 300,000 updates within 16 hours using 8*A100 using that config.

Divide by zero error

Hello，great work! Since there is no nvcc on the server shared by our laboratory, I choose to use torch to calculate the dag loss. When running, I find that logging_outputs and ntokens are 0, and the error is as follows：

There is a divide by zero error, which I suspect is caused by version mismatch，my experimental environment is as follows：
pytorch and cuda version:1.10.1+cu102 Python 3.7.11 gcc version 7.5.0 fairseq-1.0.0a0+2d06841
For the above problems, can I ask you for solutions? Thank you very much! @hzhwcmhf

errors when executing script for generating the binarized data

steps to reproduce the error
1,git clone --recurse-submodules https://github.com/thu-coai/DA-Transformer.git && pip install -e .
it didn't work well .I execute git clone --recurse-submodules https://github.com/thu-coai/DA-Transformer.git alone and then cd DA-Transformer,pip install -e . works fine
2,I tried to use the script in readme to generate binarized data

input_dir=path/to/raw_data        # directory of pre-processed text data
data_dir=path/to/binarized_data   # directory of the generated binarized data
src=src                           # source suffix
tgt=tgt                           # target suffix
fairseq-datpreprocess --source-lang ${src} --target-lang ${tgt} \
    --trainpref ${input_dir}/train --validpref ${input_dir}/valid --testpref ${input_dir}/test \
    --src-dict ${input_dir}/dict.${src}.txt --tgt-dict {input_dir}/dict.${tgt}.txt \
    --destdir ${data_dir} --workers 32 \
    --user-dir fs_plugins --task translation_dat_task [--seg-tokens 32]

# seg-tokens should be set to 32 when you use pre-trained models.

I don't know what's going wrong. Plz help me

runtimeerror

python 3.7
pytorch 1.10.1+cu111
gcc 5.4.0

I have modified the cloneable.h file according to the FAQs section, but I still encounter the following error when the program is running. Moreover, I have tried to run this code under gcc==7.5.0, the same error appears. Please tell me how can i fix it?

can the output model be transform to onnx format?

as title

Runtime error when using live demo

Hello,

I encountered a runtime error when using the live demo on HuggingFace Space. The error message is as follows:

error message

Runtime error

failed to create containerd task: failed to create shim task: context canceled: unknown

Container logs:

===== Application Startup at 2023-09-03 01:29:23 =====

2023-09-03 01:40:53 | INFO | __main__ | args: Namespace(host='0.0.0.0', port=None, concurrency_count=1, share=False)
/home/user/app/app.py:421: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  model_selector = gr.Dropdown(
/home/user/app/app.py:327: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  model_selector = gr.Dropdown(

Link

https://huggingface.co/spaces/thu-coai/DA-Transformer