mattiadg / fbk-fairseq-st Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 13.0 2.46 MB

An adaptation of Fairseq to (End-to-end) speech translation.

License: Other

Python 99.03% C++ 0.42% Lua 0.52% Shell 0.02%

fbk-fairseq-st's People

Contributors

Stargazers

Watchers

Forkers

matti16 jrvc formiel guillemcortes ericosmic keiouok donniezhang586 kabongosalomon bhaddow xiaoqingnlp gegallego jmiller0711 hbr690188270

fbk-fairseq-st's Issues

Command to reproduce results on MuST-C failing

Running the command provided in the readme to reproduce the results on MuST-C of the paper "Adapting Transformer to End-to-End Spoken Language Translation" results in the following error:

| distributed init (rank 1): tcp://localhost:18735
| distributed init (rank 0): tcp://localhost:18735
| distributed init (rank 2): tcp://localhost:18735
| distributed init (rank 3): tcp://localhost:18735
Namespace(adam_betas='(0.9, 0.999)', adam_eps=1e-08, arch='speechconvtransformer_big', attention_dropout=0.1, attn_2d=True, audio_input=True, bucket_cap_mb=150, clip_norm=20.0, criterion='label_smoothed_cross_entropy', data=['bin/'], ddp_backend='no_c10d', decoder_attention_heads=8, decoder_embed_dim=512, decoder_ffn_embed_dim=1024, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, decoder_out_embed_dim=512, decoder_output_dim=512, device_id=0, distance_penalty='gauss', distributed_backend='nccl', distributed_init_host='localhost', distributed_init_method='tcp://localhost:18735', distributed_port=18736, distributed_rank=0, distributed_world_size=4, dropout=0.1, encoder_attention_heads=8, encoder_convolutions='[(64, 3, 3)] * 2', encoder_embed_dim=512, encoder_ffn_embed_dim=1024, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fix_batches_to_gpus=False, fp16=False, fp16_init_scale=128, fp16_scale_window=None, freeze_encoder=False, init_variance=1.0, keep_interval_updates=-1, label_smoothing=0.1, left_pad_source='True', left_pad_target='False', log_format=None, log_interval=1000, lr=[0.005], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=100, max_sentences=8, max_sentences_valid=8, max_source_positions=1400, max_target_positions=300, max_tokens=12000, max_update=0, min_loss_scale=0.0001, min_lr=1e-08, momentum=0.99, no_attn_2d=False, no_cache_source=False, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, normalization_constant=1.0, optimizer='adam', optimizer_overrides='{}', raw_text=False, relu_dropout=0.1, reset_lr_scheduler=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='models', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=True, skip_invalid_size_inputs_valid_test=True, source_lang=None, target_lang=None, task='translation', train_subset='train', update_freq=[16], upsample_primary=1, valid_subset='valid', validate_interval=1, warmup_init_lr=0.0003, warmup_updates=4000, weight_decay=0.0)
| [h5] dictionary: 4 types
| [de] dictionary: 192 types
| bin/ train 229703 examples
| bin/ valid 1423 examples
Exception ignored in: <function IndexedDataset.__del__ at 0x7f0de0de5790>
Traceback (most recent call last):
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/indexed_dataset.py", line 85, in __del__
Traceback (most recent call last):
  File "../../train.py", line 365, in <module>
Exception ignored in: <function IndexedDataset.__del__ at 0x7f9f0b8f3790>
Traceback (most recent call last):
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/indexed_dataset.py", line 85, in __del__
    def __del__(self):
KeyboardInterrupt: 
    multiprocessing_main(args)
    def __del__(self):
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/multiprocessing_train.py", line 42, in main
KeyboardInterrupt: 
    p.join()
  File "/home/amit/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/amit/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/amit/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/multiprocessing_train.py", line 84, in signal_handler
    raise Exception(msg)
Exception: 

-- Tracebacks above this line can probably be ignored --

Traceback (most recent call last):
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/multiprocessing_train.py", line 48, in run
    single_process_main(args)
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/train.py", line 53, in main
    dummy_batch = task.dataset('train').get_dummy_batch(args.max_tokens, max_positions)
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/language_pair_dataset.py", line 221, in get_dummy_batch
    return self.collater([
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/language_pair_dataset.py", line 224, in <listcomp>
    'source': self.src_dict.dummy_sentence(src_len) if self.src_dict is not None else None,
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/dictionary.py", line 302, in dummy_sentence
    t = torch.Tensor(length).new_empty((length, self.audio_features)).uniform_(self.nspecial + 1, len(self))
RuntimeError: Expected a_in <= b_in to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

I've tried running the command with Python 3.5 and Python 3.8 and I get the same error both times
I believe the error is caused because the parameters being passed to torch::nn::init::uniform_ are incorrect.

I tried fixing the error myself by changing self.nspecial + 1 to self.nspecial in the following line

FBK-Fairseq-ST/fairseq/data/dictionary.py

Line 302 in 2d15240

 t = torch.Tensor(length).new_empty((length, self.audio_features)).uniform_(self.nspecial + 1, len(self)) 

Is this a valid fix?

Thanks in advance,
Chaitanya

Readme preprocessing command

Hi,

I think that the given command example for binarizing data is not correct

python preprocess.py -s <audio_format> -t fr --format <h5 | npz> --inputtype audio \
	--trainpref <path_to_train_data> [[--validpref <path_to_validation_data>] \
	[--testpref <path_to_test_data>]] --destdir <path to output folder>

and should be

python preprocess.py -s <source_language> -t <target_language> --format <h5 | npz> --inputtype audio \
	--trainpref <path_to_train_data> [[--validpref <path_to_validation_data>] \
	[--testpref <path_to_test_data>]] --destdir <path to output folder>

I'm just trying to run the example and I came across with this.

Many thanks

Error when not using any distance penalty

Hello @mattiadg

When I don't use the --distance-penalty flag I get the following error:

File "~/FBK-Fairseq-ST/fairseq/models/s_transformer.py", line 472, in __init__
    init_variance=(args.init_variance if args.distance_penalty == 'gauss' else None)
TypeError: __init__() got an unexpected keyword argument 'penalty'

The problem comes from the following lines in the constructor of TransformerEncoderLayer:

attn = LocalMultiheadAttention if args.distance_penalty != False else MultiheadAttention	
self.self_attn = attn(
	self.embed_dim, args.encoder_attention_heads,
	dropout=args.attention_dropout, penalty=args.distance_penalty,
	init_variance=(args.init_variance if args.distance_penalty == 'gauss' else None)
)

The argumentspenalty and init_variance do not exist in MultiheadAttention, so I substituted these lines by:

if args.distance_penalty != False:
    self.self_attn = LocalMultiheadAttention(
        self.embed_dim, args.encoder_attention_heads,
        dropout=args.attention_dropout, penalty=args.distance_penalty,
        init_variance=(args.init_variance if args.distance_penalty == 'gauss' else None)
    )
else:
    self.self_attn = MultiheadAttention(
        self.embed_dim, args.encoder_attention_heads,
        dropout=args.attention_dropout,
    )

ConvAttention2D

Hi,

I'm trying to run the transformer-based model but I'm having an error related to not finding ConvAttention2D. According to the code, it should be in the modules folder, but I could not find it.

Thank you in advance,

Carlos

ValueError: Cannot load file containing pickled data when allow_pickle=False

train.py, FileNotFoundError, asking for dict.source.txt

I have tried to train a model with the following parameters:

python FBK-Fairseq-ST/train.py path/to/binarized/data \
    --clip-norm 5 --max-sentences 32 --max-tokens 100000 --save-dir model/ --max-epoch 150 \
    --lr 0.001 --lr-shrink 1.0 --min-lr 1e-08 --dropout 0.2 --lr-schedule fixed --optimizer adam \
    --arch ast_seq2seq --decoder-attention True --seed 666 --task translation \
    --skip-invalid-size-inputs-valid-test --sentence-avg --attention-type general \
    --learn-initial-state --criterion label_smoothed_cross_entropy --label-smoothing 0.1

It results in FileNotFoundError [Errno 2] No such file or directory: 'path/to/binarized/data/dict.npz.txt

The output of the binarization process however does not include the source language dictionary:

python FBK-Fairseq-ST/preprocess.py -s npz -t tok --format npz --inputtype audio \
--trainpref /path/non-binarized/data \
--destdir /path/binarized/data

Files in path/to/binarized/data/

---------------> train.npz-tok.idx
---------------> train.npz-tok.bin
---------------> train.npz-tok.npz.idx
---------------> train.npz-tok.npz.bin
---------------> dict.tok.txt

It seems correct, as I have understood by reading Mattia Di Gangi's article on Medium: "we have a dictionary for the target language (dict.it.txt), and for each split of the data, an index and a content file for the source side (*.h5.idx and .h5.bin) and the same for the target side (.it.idx and *.it.bin)".

Then why does the script FBK-Fairseq-ST/fairseq/data/dictionary.py attempts to open dict.npz.txt (source language dict) ?

The problem arises also when using the MUstC English-Italian dataset (h5 instead of npz):
FileNotFoundError: [Errno 2] No such file or directory: 'path/to/binarized/data/dict.h5.txt'

Training with custom dataset

Hi, thank you for providing the repository.

Could you please guide me, how should I prepare my dataset, so that I can run the experiment?

Current dataset structure is as follows:

Source language:
source1.wav
source1.txt (transcript of source1.wav)
source2.wav
source2.txt
....

Traget language
target1.txt ( translation of source1.txt)
target2.txt
....

I have gone through this tutorial too Getting Started with End-to-End Speech Translation. But, I could not understand how I should prepare or arrange my dataset as per FBK-Fairseq-ST requirement. Should I create a csv file and put the wav file names (source language) in the first column and the text (target language) in the next coulmn OR any other json/csv file that will keep track or map the audio and the text file.

As per the tutorial, I have to prepare a pre-trained ASR model first for FBK-Fairseq-ST.

I am new in this field, I be would thankful for any guidance.

Thank you.

IndexError while recreating the MustC experiment(tgt_sizes is has a shape (0,)

Hi,
I'm trying to recreate the EN-IT experiment on the MustC corpus and ran into this issue while training:
Traceback (most recent call last):
File "train.py", line 367, in
main(args)
File "train.py", line 73, in main
shard_id=args.distributed_rank,
File "FBK-Fairseq-ST/fairseq/tasks/fairseq_task.py", line 96, in get_batch_iterator
indices = dataset.ordered_indices()
File "FBK-Fairseq-ST/fairseq/data/language_pair_dataset.py", line 250, in ordered_indices
indices = indices[np.argsort(self.tgt_sizes[indices], kind='mergesort')]
IndexError: index 216490 is out of bounds for axis 0 with size 0

It seems that the tgt_sizes np array has a shape (0,) so this is causing the issue. Could you please guide me on resolving this issue?Thanks!

Speech Translation on a custom dataset

Hi,
Thank you for your work! Here are a couple of queries that I had:
1.If I wanted to train the ST model on a custom dataset, how would I go about it? What format does the data folder need to be in?
2. Are there any pre-trained models available for ST? If so, can we fine-tune that on a custom dataset?

Thanks!

RuntimeError: CUDA out of memory after training 1 epoch

@mattiadg
I'm currently training on a very very large dataset with 4 GPUs and I get a CUDA out of memory error after the completion of 1 training epoch. After the training is complete, when validation starts, it runs out of memory.
Here is the exact message:
Tried to allocate 7.93 GiB (GPU 2; 22.38 GiB total capacity; 11.55 GiB already allocated; 3.53 GiB free; 6.75 GiB cached)
Is this a memory leak? Is there an issue with emptying the cache or do I just need to reduce the batch size/max tokens?(already tried reducing the batch size by half and the same error occurs)
Thanks!

download the MuST-C corpus

I am currently doing an experiment on FBK-fairseq-ST speech translation on github, but due to insufficient corpus, some problems have appeared. So I searched the MuST-C corpus on the website, but it seems that it cannot be downloaded now. If possible, could you send me a copy of the corpus. I will be very grateful.
Thank you very much!

preprocess.py ValueError: Cannot register model architecture for unknown model type (r_transformer_lm)

When using the preprocess.py, an error pops up saying "ValueError: Cannot register model architecture for unknown model type (r_transformer_lm)".

Generate translation on a single audio

@mattiadg
Once the model is trained, how can I use it to generate translations on a single audio/group of audios? These audios don't have the GT associated with them. generate.py requires a binarized folder so that will not work as is.