thu-coai / da-transformer Goto Github PK
View Code? Open in Web Editor NEWOfficial Implementation for the ICML2022 paper "Directed Acyclic Transformer for Non-Autoregressive Machine Translation"
License: Other
Official Implementation for the ICML2022 paper "Directed Acyclic Transformer for Non-Autoregressive Machine Translation"
License: Other
Hello! I successfully run the code. However, when the running process reaches this step, it stops and does not continue without any error. Do you have any advice or opinion about this problem?
2022-10-18 17:05:27 | INFO | fairseq.utils | ***********************CUDA enviroments for all 4 workers***********************
2022-10-18 17:05:27 | INFO | fairseq_cli.train | training on 4 devices (GPUs/TPUs)
2022-10-18 17:05:27 | INFO | fairseq_cli.train | max tokens per device = 2048 and max sentences per device = None
2022-10-18 17:05:27 | INFO | fairseq.trainer | Preparing to load checkpoint ./model/checkpoint_last.pt
2022-10-18 17:05:27 | INFO | fairseq.trainer | No existing checkpoint found ./model/checkpoint_last.pt
2022-10-18 17:05:27 | INFO | fairseq.trainer | loading train data for epoch 1
2022-10-18 17:05:28 | INFO | fairseq.data.data_utils | loaded 4,500,966 examples from: ./bin_data/WMT16/train.en-de.en
2022-10-18 17:05:28 | INFO | fairseq.data.data_utils | loaded 4,500,966 examples from: ./bin_data/WMT16/train.en-de.de
2022-10-18 17:05:28 | INFO | fairseq.tasks.translation | ./bin_data/WMT16 train en-de 4500966 examples
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:35 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1278
2022-10-18 17:05:35 | INFO | fairseq.trainer | begin training epoch 1
2022-10-18 17:05:35 | INFO | fairseq_cli.train | Start iterating over samples
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Thanks for your good job. I wonder the speedup of using the cuda operation compared with PyTorch native operations. And is there a good tutorial to start the cuda programming.
Hi,
Thanks for your nice paper and code!
Would you like to share the distilled datasets used in this paper?
Hello, this is great work in NAT and I like it. I tried to modify the src-upsample-scale and make the lambda smaller, like 2 or 4. But it is raise an error: "dag_best_alignment.cu:68: calculate_maxalpha_kernel: block: [0,77,0], thread: [0,244,0] Assertion output_len >= target_len && "dag_best_alignment: graph size is too small (smaller than target length)"
failed."
Do you know how to fix this error? Thank you
Hello, I tried to reproduce the situation where factor=4 and used Lookahead decoding, but get the result 25.64
BLEU which is lower than the reported one 26.14
BLEU Score on WMT'14 EN-DE raw data in the paper. I use the same environment, same training script, same decoding script and the same dataset but still fail. Can you help me? Or can you share the checkpoints on WMT14 EN-DE raw data and distilled data?
fairseq-train ${data_dir} \
\
`# loading DA-Transformer plugins` \
--user-dir fs_plugins \
\
`# DA-Transformer Task Configs` \
--task translation_dat_task \
--upsample-base source --upsample-scale 4 \
--filter-max-length 128:1024 --filter-ratio 2 \
--skip-invalid-size-inputs-valid-test \
\
`# DA-Transformer Architecture Configs` \
--arch glat_decomposed_link_base \
--links-feature feature:position \
--max-source-positions 128 --max-target-positions 1024 \
--encoder-learned-pos --decoder-learned-pos \
--share-all-embeddings --activation-fn gelu --apply-bert-init \
\
`# DA-Transformer Decoding Configs (See more in the decoding section)` \
--decode-strategy lookahead --decode-upsample-scale 4.0 \
\
`# DA-Transformer Criterion Configs` \
--criterion nat_dag_loss \
--length-loss-factor 0 --max-transition-length 99999 \
--glat-p 0.5:0.1@200k --glance-strategy number-random \
--no-force-emit \
\
`# Optimizer & Regularizer Configs` \
--optimizer adam --adam-betas '(0.9,0.999)' --fp16 \
--label-smoothing 0.0 --weight-decay 0.01 --dropout 0.1 \
--lr-scheduler inverse_sqrt --warmup-updates 10000 \
--clip-norm 0.1 --lr 0.0005 --warmup-init-lr '1e-07' --stop-min-lr '1e-09' \
\
`# Training Configs` \
--max-tokens 32392 --max-tokens-valid 4096 --update-freq 1 \
--max-update 300000 --grouped-shuffling \
--max-encoder-batch-tokens 8000 --max-decoder-batch-tokens 34000 \
--seed 0 --ddp-backend c10d --required-batch-size-multiple 1 \
\
`# Validation Configs` \
--valid-subset valid \
--validate-interval 1 --validate-interval-updates 10000 \
--eval-bleu --eval-bleu-detok space --eval-bleu-remove-bpe --eval-bleu-print-samples --eval-tokenized-bleu \
--fixed-validation-seed 7 \
\
`# Checkpoint Configs` \
--best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
--save-interval 1 --save-interval-updates 10000 \
--keep-best-checkpoints 5 --save-dir ${checkpoint_dir} \
\
`# Logging Configs` \
--log-format 'simple' --log-interval 100
average_checkpoint_path=${checkpoint_dir}/average.pt
python3 ./fs_plugins/scripts/average_checkpoints.py \
--inputs ${checkpoint_dir} \
--max-metric \
--best-checkpoints-metric bleu \
--num-best-checkpoints-metric 5 \
--output ${average_checkpoint_path}
fairseq-generate ${data_dir} \
--gen-subset test --user-dir fs_plugins --task translation_dat_task \
--remove-bpe --max-tokens 4096 --seed 0 \
--decode-strategy lookahead --decode-upsample-scale 4 --decode-beta 1 \
--path ${average_checkpoint_path}
Hi, do you have pretrained models to share with us?
I have modified the cloneable.h file according to the FAQs section, but I still encounter the following error when the program is running. Please tell me how can i fix it?
Traceback (most recent call last): File /home/env/nat/lib/python3.7/site-packages/torch/utils/cpp_extension.py, line 1746, in _run_ninja_build env=env) File /home/env/nat/lib/python3.7/subprocess.py, line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command [ninja, -v] returned non-zero exit status 1. The above exception was the direct cause of the following exception: RuntimeError: Error building extension 'dag_loss_fn': [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=dag_loss_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/env/nat/lib/python3.7/site-packages/torch/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/TH -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/env/nat/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -DOF_SOFTMAX_USE_FAST_MATH -std=c++14 -c /home/DA-Transformer/fs_plugins/custom_ops/logsoftmax_gather.cu -o logsoftmax_gather.cuda.o FAILED: logsoftmax_gather.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=dag_loss_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/env/nat/lib/python3.7/site-packages/torch/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/TH -isystem /home/env/nat/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/env/nat/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -DOF_SOFTMAX_USE_FAST_MATH -std=c++14 -c /home/DA-Transformer/fs_plugins/custom_ops/logsoftmax_gather.cu -o logsoftmax_gather.cuda.o /home/DA-Transformer/fs_plugins/custom_ops/logsoftmax_gather.cu:31:23: fatal error: cub/cub.cuh: No such file or directory compilation terminated. ninja: build stopped: subcommand failed
I want to get the best 10 translations through the search algorithm, but I don't know how to do it.
Hi, I tried to train a miniaturized model with 6-layer encoder 3-layer decoder and 256 hidden dims, but found that the accuracy of the model declines rapidly. Is there any suggestion for model miniaturization? Thanks.
To replicate and build upon your results, it is crucial for me to have a comprehensive understanding of the training configuration employed during the experiments. Is the examples/DA-Transformer/wmt14_ende.sh
the config used to get the results in your paper. I found it impossible to finished 300,000 updates within 16 hours using 8*A100 using that config.
Hello,great work! Since there is no nvcc on the server shared by our laboratory, I choose to use torch to calculate the dag loss. When running, I find that logging_outputs and ntokens are 0, and the error is as follows:
There is a divide by zero error, which I suspect is caused by version mismatch,my experimental environment is as follows:
pytorch and cuda version:1.10.1+cu102 Python 3.7.11 gcc version 7.5.0 fairseq-1.0.0a0+2d06841
For the above problems, can I ask you for solutions? Thank you very much! @hzhwcmhf
steps to reproduce the error
1,git clone --recurse-submodules https://github.com/thu-coai/DA-Transformer.git && pip install -e .
it didn't work well .I execute git clone --recurse-submodules https://github.com/thu-coai/DA-Transformer.git alone and then cd DA-Transformer,pip install -e . works fine
2,I tried to use the script in readme to generate binarized data
input_dir=path/to/raw_data # directory of pre-processed text data
data_dir=path/to/binarized_data # directory of the generated binarized data
src=src # source suffix
tgt=tgt # target suffix
fairseq-datpreprocess --source-lang ${src} --target-lang ${tgt} \
--trainpref ${input_dir}/train --validpref ${input_dir}/valid --testpref ${input_dir}/test \
--src-dict ${input_dir}/dict.${src}.txt --tgt-dict {input_dir}/dict.${tgt}.txt \
--destdir ${data_dir} --workers 32 \
--user-dir fs_plugins --task translation_dat_task [--seg-tokens 32]
# seg-tokens should be set to 32 when you use pre-trained models.
as title
Hello,
I encountered a runtime error when using the live demo on HuggingFace Space. The error message is as follows:
failed to create containerd task: failed to create shim task: context canceled: unknown
===== Application Startup at 2023-09-03 01:29:23 =====
2023-09-03 01:40:53 | INFO | __main__ | args: Namespace(host='0.0.0.0', port=None, concurrency_count=1, share=False)
/home/user/app/app.py:421: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
model_selector = gr.Dropdown(
/home/user/app/app.py:327: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
model_selector = gr.Dropdown(
https://huggingface.co/spaces/thu-coai/DA-Transformer
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.