mattiadg / fbk-fairseq-st Goto Github PK
View Code? Open in Web Editor NEWAn adaptation of Fairseq to (End-to-end) speech translation.
License: Other
An adaptation of Fairseq to (End-to-end) speech translation.
License: Other
Running the command provided in the readme to reproduce the results on MuST-C of the paper "Adapting Transformer to End-to-End Spoken Language Translation" results in the following error:
| distributed init (rank 1): tcp://localhost:18735
| distributed init (rank 0): tcp://localhost:18735
| distributed init (rank 2): tcp://localhost:18735
| distributed init (rank 3): tcp://localhost:18735
Namespace(adam_betas='(0.9, 0.999)', adam_eps=1e-08, arch='speechconvtransformer_big', attention_dropout=0.1, attn_2d=True, audio_input=True, bucket_cap_mb=150, clip_norm=20.0, criterion='label_smoothed_cross_entropy', data=['bin/'], ddp_backend='no_c10d', decoder_attention_heads=8, decoder_embed_dim=512, decoder_ffn_embed_dim=1024, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, decoder_out_embed_dim=512, decoder_output_dim=512, device_id=0, distance_penalty='gauss', distributed_backend='nccl', distributed_init_host='localhost', distributed_init_method='tcp://localhost:18735', distributed_port=18736, distributed_rank=0, distributed_world_size=4, dropout=0.1, encoder_attention_heads=8, encoder_convolutions='[(64, 3, 3)] * 2', encoder_embed_dim=512, encoder_ffn_embed_dim=1024, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fix_batches_to_gpus=False, fp16=False, fp16_init_scale=128, fp16_scale_window=None, freeze_encoder=False, init_variance=1.0, keep_interval_updates=-1, label_smoothing=0.1, left_pad_source='True', left_pad_target='False', log_format=None, log_interval=1000, lr=[0.005], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=100, max_sentences=8, max_sentences_valid=8, max_source_positions=1400, max_target_positions=300, max_tokens=12000, max_update=0, min_loss_scale=0.0001, min_lr=1e-08, momentum=0.99, no_attn_2d=False, no_cache_source=False, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, normalization_constant=1.0, optimizer='adam', optimizer_overrides='{}', raw_text=False, relu_dropout=0.1, reset_lr_scheduler=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='models', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=True, skip_invalid_size_inputs_valid_test=True, source_lang=None, target_lang=None, task='translation', train_subset='train', update_freq=[16], upsample_primary=1, valid_subset='valid', validate_interval=1, warmup_init_lr=0.0003, warmup_updates=4000, weight_decay=0.0)
| [h5] dictionary: 4 types
| [de] dictionary: 192 types
| bin/ train 229703 examples
| bin/ valid 1423 examples
Exception ignored in: <function IndexedDataset.__del__ at 0x7f0de0de5790>
Traceback (most recent call last):
File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/indexed_dataset.py", line 85, in __del__
Traceback (most recent call last):
File "../../train.py", line 365, in <module>
Exception ignored in: <function IndexedDataset.__del__ at 0x7f9f0b8f3790>
Traceback (most recent call last):
File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/indexed_dataset.py", line 85, in __del__
def __del__(self):
KeyboardInterrupt:
multiprocessing_main(args)
def __del__(self):
File "/home/amit/amit/pruning/FBK-Fairseq-ST/multiprocessing_train.py", line 42, in main
KeyboardInterrupt:
p.join()
File "/home/amit/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)
File "/home/amit/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/home/amit/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
File "/home/amit/amit/pruning/FBK-Fairseq-ST/multiprocessing_train.py", line 84, in signal_handler
raise Exception(msg)
Exception:
-- Tracebacks above this line can probably be ignored --
Traceback (most recent call last):
File "/home/amit/amit/pruning/FBK-Fairseq-ST/multiprocessing_train.py", line 48, in run
single_process_main(args)
File "/home/amit/amit/pruning/FBK-Fairseq-ST/train.py", line 53, in main
dummy_batch = task.dataset('train').get_dummy_batch(args.max_tokens, max_positions)
File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/language_pair_dataset.py", line 221, in get_dummy_batch
return self.collater([
File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/language_pair_dataset.py", line 224, in <listcomp>
'source': self.src_dict.dummy_sentence(src_len) if self.src_dict is not None else None,
File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/dictionary.py", line 302, in dummy_sentence
t = torch.Tensor(length).new_empty((length, self.audio_features)).uniform_(self.nspecial + 1, len(self))
RuntimeError: Expected a_in <= b_in to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
I've tried running the command with Python 3.5 and Python 3.8 and I get the same error both times
I believe the error is caused because the parameters being passed to torch::nn::init::uniform_ are incorrect.
I tried fixing the error myself by changing self.nspecial + 1
to self.nspecial
in the following line
FBK-Fairseq-ST/fairseq/data/dictionary.py
Line 302 in 2d15240
Is this a valid fix?
Thanks in advance,
Chaitanya
Hi,
I think that the given command example for binarizing data is not correct
python preprocess.py -s <audio_format> -t fr --format <h5 | npz> --inputtype audio \
--trainpref <path_to_train_data> [[--validpref <path_to_validation_data>] \
[--testpref <path_to_test_data>]] --destdir <path to output folder>
and should be
python preprocess.py -s <source_language> -t <target_language> --format <h5 | npz> --inputtype audio \
--trainpref <path_to_train_data> [[--validpref <path_to_validation_data>] \
[--testpref <path_to_test_data>]] --destdir <path to output folder>
I'm just trying to run the example and I came across with this.
Many thanks
Hello @mattiadg
When I don't use the --distance-penalty
flag I get the following error:
File "~/FBK-Fairseq-ST/fairseq/models/s_transformer.py", line 472, in __init__
init_variance=(args.init_variance if args.distance_penalty == 'gauss' else None)
TypeError: __init__() got an unexpected keyword argument 'penalty'
The problem comes from the following lines in the constructor of TransformerEncoderLayer
:
attn = LocalMultiheadAttention if args.distance_penalty != False else MultiheadAttention
self.self_attn = attn(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, penalty=args.distance_penalty,
init_variance=(args.init_variance if args.distance_penalty == 'gauss' else None)
)
The argumentspenalty
and init_variance
do not exist in MultiheadAttention
, so I substituted these lines by:
if args.distance_penalty != False:
self.self_attn = LocalMultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, penalty=args.distance_penalty,
init_variance=(args.init_variance if args.distance_penalty == 'gauss' else None)
)
else:
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout,
)
Hi,
I'm trying to run the transformer-based model but I'm having an error related to not finding ConvAttention2D. According to the code, it should be in the modules folder, but I could not find it.
Thank you in advance,
Carlos
I have tried to train a model with the following parameters:
python FBK-Fairseq-ST/train.py path/to/binarized/data \
--clip-norm 5 --max-sentences 32 --max-tokens 100000 --save-dir model/ --max-epoch 150 \
--lr 0.001 --lr-shrink 1.0 --min-lr 1e-08 --dropout 0.2 --lr-schedule fixed --optimizer adam \
--arch ast_seq2seq --decoder-attention True --seed 666 --task translation \
--skip-invalid-size-inputs-valid-test --sentence-avg --attention-type general \
--learn-initial-state --criterion label_smoothed_cross_entropy --label-smoothing 0.1
It results in FileNotFoundError [Errno 2] No such file or directory: 'path/to/binarized/data/dict.npz.txt
The output of the binarization process however does not include the source language dictionary:
python FBK-Fairseq-ST/preprocess.py -s npz -t tok --format npz --inputtype audio \
--trainpref /path/non-binarized/data \
--destdir /path/binarized/data
Files in path/to/binarized/data/
---------------> train.npz-tok.idx
---------------> train.npz-tok.bin
---------------> train.npz-tok.npz.idx
---------------> train.npz-tok.npz.bin
---------------> dict.tok.txt
It seems correct, as I have understood by reading Mattia Di Gangi's article on Medium: "we have a dictionary for the target language (dict.it.txt), and for each split of the data, an index and a content file for the source side (*.h5.idx and .h5.bin) and the same for the target side (.it.idx and *.it.bin)".
Then why does the script FBK-Fairseq-ST/fairseq/data/dictionary.py
attempts to open dict.npz.txt
(source language dict) ?
The problem arises also when using the MUstC English-Italian dataset (h5 instead of npz):
FileNotFoundError: [Errno 2] No such file or directory: 'path/to/binarized/data/dict.h5.txt'
Hi, thank you for providing the repository.
Could you please guide me, how should I prepare my dataset, so that I can run the experiment?
Current dataset structure is as follows:
Source language:
source1.wav
source1.txt (transcript of source1.wav)
source2.wav
source2.txt
....
Traget language
target1.txt ( translation of source1.txt)
target2.txt
....
I have gone through this tutorial too Getting Started with End-to-End Speech Translation. But, I could not understand how I should prepare or arrange my dataset as per FBK-Fairseq-ST requirement. Should I create a csv file and put the wav file names (source language) in the first column and the text (target language) in the next coulmn OR any other json/csv file that will keep track or map the audio and the text file.
As per the tutorial, I have to prepare a pre-trained ASR model first for FBK-Fairseq-ST.
I am new in this field, I be would thankful for any guidance.
Thank you.
Hi,
I'm trying to recreate the EN-IT experiment on the MustC corpus and ran into this issue while training:
Traceback (most recent call last):
File "train.py", line 367, in
main(args)
File "train.py", line 73, in main
shard_id=args.distributed_rank,
File "FBK-Fairseq-ST/fairseq/tasks/fairseq_task.py", line 96, in get_batch_iterator
indices = dataset.ordered_indices()
File "FBK-Fairseq-ST/fairseq/data/language_pair_dataset.py", line 250, in ordered_indices
indices = indices[np.argsort(self.tgt_sizes[indices], kind='mergesort')]
IndexError: index 216490 is out of bounds for axis 0 with size 0
It seems that the tgt_sizes np array has a shape (0,) so this is causing the issue. Could you please guide me on resolving this issue?Thanks!
Hi,
Thank you for your work! Here are a couple of queries that I had:
1.If I wanted to train the ST model on a custom dataset, how would I go about it? What format does the data folder need to be in?
2. Are there any pre-trained models available for ST? If so, can we fine-tune that on a custom dataset?
Thanks!
@mattiadg
I'm currently training on a very very large dataset with 4 GPUs and I get a CUDA out of memory error after the completion of 1 training epoch. After the training is complete, when validation starts, it runs out of memory.
Here is the exact message:
Tried to allocate 7.93 GiB (GPU 2; 22.38 GiB total capacity; 11.55 GiB already allocated; 3.53 GiB free; 6.75 GiB cached)
Is this a memory leak? Is there an issue with emptying the cache or do I just need to reduce the batch size/max tokens?(already tried reducing the batch size by half and the same error occurs)
Thanks!
I am currently doing an experiment on FBK-fairseq-ST speech translation on github, but due to insufficient corpus, some problems have appeared. So I searched the MuST-C corpus on the website, but it seems that it cannot be downloaded now. If possible, could you send me a copy of the corpus. I will be very grateful.
Thank you very much!
When using the preprocess.py, an error pops up saying "ValueError: Cannot register model architecture for unknown model type (r_transformer_lm)".
@mattiadg
Once the model is trained, how can I use it to generate translations on a single audio/group of audios? These audios don't have the GT associated with them. generate.py requires a binarized folder so that will not work as is.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.