bytedance / neurst Goto Github PK

Neural end-to-end Speech Translation Toolkit

License: Other

Makefile 0.03% Python 99.92% Shell 0.05%

neurst's Introduction

The primary motivation of NeurST is to facilitate NLP researchers to get started on end-to-end speech translation (ST) and build advanced neural machine translation (NMT) models.

See here for a full list of NeurST examples. And we present recent progress of end-to-end ST technology at https://st-benchmark.github.io/.

NeurST is based on TensorFlow2 and we are working on the pytorch version.

NeurST News

March 29, 2022: Release of GigaST dataset: a large-scale speech translation corpus.

Aug 16, 2021: Release of models and results for IWSLT 2021 offline ST and simultaneous translation task.

June 15, 2021: Integration of LightSeq for training speedup, see the experimental branch.

March 28, 2021: The v0.1.1 release includes the instructions of weight pruning and quantization aware training for transformer models, and several more features. See the release note for more details.

Dec. 25, 2020: The v0.1.0 release includes the overall design of the code structure and recipes for training end-to-end ST models. See the release note for more details.

Highlights

Production ready: The model trained by NeurST can be directly exported as TF savedmodel format and use TensorFlow-serving. There is no gap between the research model and production model. Additionally, one can use LightSeq for NeurST model serving with a much lower latency.
Light weight: NeurST is designed specifically for end-to-end ST and NMT models, with clean and simple code. It has no dependency on Kaldi, which simplifies installation and usage.
Extensibility and scalability: NeurST has the careful design for extensibility and scalability. It allows users to customize Model, Task, Dataset etc. and combine each other.
High computation efficiency: NeurST has high computation efficiency and can be further optimized by enabling mixed-precision and XLA. Fast distributed training using Byteps / Horovod is also supported for large-scale scenarios.
Reliable and reproducible benchmarks: NeurST reports strong baselines with well-designed hyper-parameters on several benchmark datasets (MT&ST). It provides a series of recipes to reproduce them.

Pretrained Models & Performance Benchmarks

NeurST provides reference implementations of various models and benchmarks. Please see examples for model links and NeurST benchmark on different datasets.

Text Translation
- Transformer on WMT14 en->de
Speech-to-Text Translation
- libri-trans
- MuST-C

Requirements and Installation

Python version >= 3.6
TensorFlow >= 2.3.0

Install NeurST from source:

git clone https://github.com/bytedance/neurst.git
cd neurst/
pip3 install -e .

If there exists ImportError during running, manually install the required packages at that time.

Citation

@InProceedings{zhao2021neurst,
  author       = {Chengqi Zhao and Mingxuan Wang and Qianqian Dong and Rong Ye and Lei Li},
  booktitle    = {the 59th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations},
  title        = {{NeurST}: Neural Speech Translation Toolkit},
  year         = {2021},
  month        = aug,
}

Contact

Any questions or suggestions, please feel free to contact us: [email protected], [email protected].

Acknowledgement

We thank Bairen Yi, Zherui Liu, Yulu Jia, Yibo Zhu, Jiaze Chen, Jiangtao Feng, Zewei Sun for their kind help.

neurst's People

Contributors

Stargazers

Watchers

neurst's Issues

wait-k running?

The versions are as follows：
python=3.6
tensorflow-gpu==2.4.0
neurst==0.1.0

Run command:
python3 -m neurst.cli.run_exp
--config_paths data/training_args.yml,data/translation_bpe.yml
--hparams_set waitk_transformer_base
--model_dir checkpoints/waitk_enzh_base
--task WaitkTranslation
--wait_k 5

After running, the following information appears：
I0127 13:06:57.520065 140639346566976 configurable.py:296] Saving model configurations to directory: checkpoints/waitk_enzh_base
2022-01-27 13:06:57.549626: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-01-27 13:06:57.550100: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2300000000 Hz

It has been stuck here, and I am asking if the environment configuration is inappropriate? Please recommend a suitable environment version

speech_transformer/agument_librispeech如何设置命令使用GPU进行训练？

您好，我目前在跑speech_transformer/agument_librispeech的代码
tensorflow版本已为2.4.1，前期没有问题
在Training with validation时使用示例代码：

python3 -m neurst.cli.run_exp \
    --config_paths /path_to_data/asr_st/asr_training_args.yml,/path_to_data/asr_st/asr_validation_args.yml \
    --hparams_set speech_transformer_s \
    --model_dir /path_to_data/asr_st/asr_benchmark

发现速度很慢，查看后发现并没有使用到gpu，而是使用的CPU，请问怎样设置命令才能使用GPU？
2.在设置--update_cycle n --batch_size 120000//n 显示
error: argument --batch_size: invalid int value: '120000//n'
请问这里的n是否要替换为数字？

CUDA Version

Hi. I'm working on the must-c example and encountered a problem that might be due to CUDA incompatibility. Could you tell me the CUDA version that was used in your setting (I don't think the github page provided this information) ?
Also, it seems that my institute's GPUs don't support mixed precision. In this case, should I apply --dtype float32?

Thanks again!

How to achieve Transformer ST + ASR pretrain

Hi. I'm currently trying to reproduce and extend the Transformer ST + ASR model. But I'm a bit confused on how to achieve the pertaining. Specifically, should the lines below be added to the configuration for asr?

    --pretrain_model /path_to_data/asr_st/asr_benchmark/best_avg \
    --pretrain_variable_pattern "(TransformerEncoder)|(input_audio)"

Would it be possible to have an example for ST+ASR? Thanks!

batch size should be greater than max source len? why?

neurst/neurst/tasks/speech2text.py

Line 298 in 20bc196

assert batch_size_per_gpu > max_src_len, (

In the above code lines, you asserted when batch size is less than maximum source length.
AFAIK, batch_size means the number of samples in one batch process, and
max source length is the maximum number of frames (signals) across source samples, aren't they?
Then, why batch size should be greater than max source length?

how to use ASR ctcbert csv dataset?

what the bert item required is? I tried to generate some vector representation of my input utterance but it's not correct.

Speech Transformer Issue

Hi. I'm trying to reproduce the speech transformer portion but have encountered an issue while training.
My setup used the example given:

python3 -m neurst.cli.run_exp \
    --config_paths /path_to_data/asr_st/asr_training_args.yml,/path_to_data/asr_st/asr_validation_args.yml \
    --hparams_set speech_transformer_s \
    --model_dir /path_to_data/asr_st/asr_benchmark

The system is producing this

TypeError: graph() got an unexpected keyword argument 'step'

I'm trying to isolate the issue and see if it's a dependency issue. Any help is appreciated.

MT training error

I'm training the MT in must-c example by modifying the given examples to the MT equivalence, namely

python3 -m neurst.cli.run_exp \
    --config_paths /mt/zh/mt_training_args.yml,/mt/zh/mt_validation_args.yml \
    --hparams_set speech_transformer_s \
    --model_dir /mt/mt_benchmark

However, I've encountered

    fake_src = numpy.random.rand(1, 4, src_meta["audio_feature_dim"], src_meta["audio_feature_channels"])
KeyError: 'audio_feature_dim'

This is only happening to my MT training. The ASR is running smoothly. Also, I double-checked and I'm pretty sure all required packages are installed and match the dependencies.

Finally, I'm curious if separately training the ASR and MT is equivalent to the "Cascaded" model, or do I have to use cascade_st tool to achieve a cascaded setup?

Thanks.

Reorganize the common script for download and preprocessing the dataset for WMT, IWSLT, TED, etc.

There are duplicate scripts in examples.

How to continue training from a saved checkpoint

Hi. I have limited walltime for GPU usage so I can only train a certain amount of steps each time. I'd like to continue training from the last saved checkpoint, so I can get a greater number of training steps. Is there a way to configure the training to start from a certain step?

Thanks!

Reproducing speech-to-text translation result on MuST-C En-De and how to get the model output

Hi all,

I'm trying to reproduce the speech-to-text translation result on MuST-C en-de (22.8 SacreBLEU) as reported in the paper.

I used the given st_specaug checkpoint, and get the following results (commands are given below):

python3 -m neurst.cli.run_exp \
    --config_paths must-c/asr_st/de/st_prediction_args.yml \
    --model_dir st_specaug

Results:

I1125 12:34:33.630551 140283317171008 sequence_generator.py:140] Generation elapsed: 267.03s
I1125 12:34:33.636475 140283317171008 dataset_utils.py:307] Loading TF Records from:
I1125 12:34:33.641129 140283317171008 dataset_utils.py:311]    b'must-c//devtest/dev.en-de.tfrecords-00000-of-00001'
I1125 12:34:38.733426 140283317171008 sequence_generator.py:166] Evaluation Result (dev):
I1125 12:34:38.733613 140283317171008 sequence_generator.py:170]    sacre_bleu=21.59
I1125 12:34:38.733672 140283317171008 sequence_generator.py:170]    tok_bleu=21.60
I1125 12:34:38.733724 140283317171008 sequence_generator.py:170]    detok_bleu=21.59
I1125 12:34:38.733772 140283317171008 sequence_generator.py:170]    uncased_sacre_bleu=22.40
I1125 12:34:38.733818 140283317171008 sequence_generator.py:170]    uncased_tok_bleu=22.21
I1125 12:34:38.733864 140283317171008 sequence_generator.py:170]    uncased_detok_bleu=22.40
I1125 12:34:38.735978 140283317171008 dataset_utils.py:307] Loading TF Records from:
I1125 12:34:38.739267 140283317171008 dataset_utils.py:311]    b'must-c//devtest/tst-COMMON.en-de.tfrecords-00000-of-00001'
I1125 12:34:46.309522 140283317171008 sequence_generator.py:166] Evaluation Result (tst-COM):
I1125 12:34:46.309664 140283317171008 sequence_generator.py:170]    sacre_bleu=22.34
I1125 12:34:46.309723 140283317171008 sequence_generator.py:170]    tok_bleu=22.32
I1125 12:34:46.309772 140283317171008 sequence_generator.py:170]    detok_bleu=22.34
I1125 12:34:46.309818 140283317171008 sequence_generator.py:170]    uncased_sacre_bleu=23.04
I1125 12:34:46.309862 140283317171008 sequence_generator.py:170]    uncased_tok_bleu=22.93
I1125 12:34:46.309904 140283317171008 sequence_generator.py:170]    uncased_detok_bleu=23.04
I1125 12:34:46.310050 140283317171008 sequence_generator.py:166] Evaluation Result (on average by weights {'dev': 0.5, 'tst-COM': 0.5}):
I1125 12:34:46.310102 140283317171008 sequence_generator.py:170]    sacre_bleu=21.96
I1125 12:34:46.310146 140283317171008 sequence_generator.py:170]    tok_bleu=21.96
I1125 12:34:46.310188 140283317171008 sequence_generator.py:170]    detok_bleu=21.96
I1125 12:34:46.310238 140283317171008 sequence_generator.py:170]    uncased_sacre_bleu=22.72
I1125 12:34:46.310281 140283317171008 sequence_generator.py:170]    uncased_tok_bleu=22.57
I1125 12:34:46.310323 140283317171008 sequence_generator.py:170]    uncased_detok_bleu=22.72
I1125 12:34:55.999752 140283317171008 sequence_generator.py:166] Evaluation Result (mixed of dev,tst-COM):
I1125 12:34:56.000025 140283317171008 sequence_generator.py:170]    sacre_bleu=22.06
I1125 12:34:56.000084 140283317171008 sequence_generator.py:170]    tok_bleu=22.06
I1125 12:34:56.000133 140283317171008 sequence_generator.py:170]    detok_bleu=22.06
I1125 12:34:56.000186 140283317171008 sequence_generator.py:170]    uncased_sacre_bleu=22.80
I1125 12:34:56.000232 140283317171008 sequence_generator.py:170]    uncased_tok_bleu=22.69
I1125 12:34:56.000276 140283317171008 sequence_generator.py:170]    uncased_detok_bleu=22.80

I find that 22.8 is for mixed results rather than tst-COM. Is there anything I missed during the evaluation? Or, is there any misunderstanding on the evaluation results? Besides, I didn't get the translation output with the above command. Could you please show more details about how to get the model's translation output so as to make translation analysis possible?

thanks,
Biao

Speech to Speech Translation

Hi..Any references here to do speech to speech translation ?

Where is the configure parameters for each model?

I can not find any parameters of each layer or model in your code. For example in LightConvolutionLayer:
kernel_size=params["conv_kernel_size_list"][lid],
num_heads=params["num_conv_heads"],
conv_type=params["conv_type"],
conv_dim=params["conv_hidden_size"],
use_glu=params["glu_after_proj"],
weight_dropout_rate=params["conv_weight_dropout_rate"],

Model trained by neurst did not save metadata

Why model trained by neurst just save the ckpt-.data-00000-of-00001 and ckpt-.index wiithout the *.meta file and *.pbtxt file? How can I fix this?
Thanks~

Easy to raise 'NaN'

Its easy to raise 'nan' error when training the translation model with 'transformer_base', have you ever encounter with this problem and how did you deal with it?

Dataset Link Not Found

I'm trying to recreate results for research purposes. However, it seems that the dataset used in the end-to-end model (found here https://persyval-platform.univ-grenoble-alpes.fr/DS91/detaildataset) is not available anymore. Is there another place I could find this data? Thanks!

wait-k running?

The versions are as follows：
python=3.6
tensorflow-gpu==2.4.0
neurst==0.1.0

It has been stuck here, and I am asking if the environment configuration is inappropriate? Please recommend a suitable environment version

how can i use NEURST model speech to text with my own mp3 or wav file

Hi
Respected sir, can you please guide me on how can I use this repo speech to text with my own mp3 or wav input file

ASR Training in Must-C Example Stuck in Pipeline

Hi. I'm working on the ASR training step from the Must-C example. After executing

python3 -m neurst.cli.run_exp \
    --config_paths /path_to_data/asr_st/asr_training_args.yml,/path_to_data/asr_st/asr_validation_args.yml \
    --hparams_set speech_transformer_s \
    --model_dir /path_to_data/asr_st/asr_benchmark

the training process became stuck for days.

Looking at the output, it seems like it got stuck at "Training for 200000 steps...Saving model configurations to directory:.."
One of the last lines of output is this

Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.

I'm not sure if it's encountered an error or if it's just slow (due to GPU incompatibility). So I'm looking for general ideas. Also, just to double-check, in the middle of training, is there supposed to be output messages on the progress or just completely silent until training is finished?

Thanks!

speech transformer example error

I am trying to follow example on speech transfomer with must-c 1.0 en-de dataset

after preprocessing with command

./examples/speech_transformer/must-c/02-audio_feature_extraction.sh ../../data/dataset de --untar

./examples/speech_transformer/must-c/03-preprocess.sh ../mosesdecoder/ ../../data/dataset de

I tried

python3 -m neurst.cli.run_exp
--config_paths /home/sylee/data/dataset/asr_st/de/asr_training_args.yml,/home/sylee/data/dataset/asr_st/de/asr_validation_args.yml
--hparams_set speech_transformer_s
--model_dir /home/sylee/data/dataset/asr_st/de/asr_benchmark

but it keeps making tensor conversion error as follow

sorry for elementary question

TypeError: Failed to convert object of type <class 'neurst.training.revised_dynamic_loss_scale.RevisedDynamicLossScale'> to Tensor. Contents: <neurst.training.revised_dynamic_loss_scale.RevisedDynamicLossScale object at 0x7f6b141da970>. Consider casting elements to a supported type.
Traceback (most recent call last):
  File "/home/sylee/.conda/envs/test2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/sylee/.conda/envs/test2/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/sylee/neurst/neurst/neurst/cli/run_exp.py", line 127, in <module>
    cli_main()
  File "/home/sylee/neurst/neurst/neurst/cli/run_exp.py", line 123, in cli_main
    app.run(_main, argv=["pseudo.py"])
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/sylee/neurst/neurst/neurst/cli/run_exp.py", line 118, in _main
    run_experiment(args, remaining_argv)
  File "/home/sylee/neurst/neurst/neurst/cli/run_exp.py", line 107, in run_experiment
    entry.run()
  File "/home/sylee/neurst/neurst/neurst/exps/trainer.py", line 294, in run
    history = keras_model.fit(
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/engine/training.py", line 1184, in fit
    tmp_logs = self.train_function(iterator)
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 933, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 759, in _initialize
    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3066, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3463, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3298, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1007, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 668, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 994, in wrapper
    raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:

    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/engine/training.py:853 train_function  *
        return step_function(self, iterator)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/engine/training.py:842 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/distribute/distribute_lib.py:1286 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/distribute/distribute_lib.py:2849 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/distribute/mirrored_strategy.py:670 _call_for_each_replica
        return mirrored_run.call_for_each_replica(
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/distribute/mirrored_run.py:104 call_for_each_replica
        return _call_for_each_replica(strategy, fn, args, kwargs)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/distribute/mirrored_run.py:246 _call_for_each_replica
        coord.join(threads)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/training/coordinator.py:389 join
        six.reraise(*self._exc_info_to_raise)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/six.py:703 reraise
        raise value
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/training/coordinator.py:297 stop_on_exception
        yield
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/distribute/mirrored_run.py:346 run
        self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/engine/training.py:835 run_step  **
        outputs = model.train_step(data)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/engine/training.py:791 train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/optimizer_v2/optimizer_v2.py:520 minimize
        grads_and_vars = self._compute_gradients(
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/mixed_precision/loss_scale_optimizer.py:676 _compute_gradients
        loss = self.get_scaled_loss(loss)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/mixed_precision/loss_scale_optimizer.py:644 get_scaled_loss
        return loss * tf.cast(self.loss_scale, loss.dtype)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/mixed_precision/loss_scale_optimizer.py:906 __getattribute__
        return object.__getattribute__(self, name)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/keras/mixed_precision/loss_scale_optimizer.py:568 loss_scale
        return tf.convert_to_tensor(self._loss_scale)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:206 wrapper
        return target(*args, **kwargs)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:1430 convert_to_tensor_v2_with_dispatch
        return convert_to_tensor_v2(
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:1436 convert_to_tensor_v2
        return convert_to_tensor(
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/profiler/trace.py:163 wrapped
        return func(*args, **kwargs)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:1566 convert_to_tensor
        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:346 _constant_tensor_conversion_function
        return constant(v, dtype=dtype, name=name)
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:271 constant
        return _constant_impl(value, dtype, shape, name, verify_shape=False,
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:288 _constant_impl
        tensor_util.make_tensor_proto(
    /home/sylee/.conda/envs/test2/lib/python3.9/site-packages/tensorflow/python/framework/tensor_util.py:551 make_tensor_proto
        raise TypeError("Failed to convert object of type %s to Tensor. "

    TypeError: Failed to convert object of type <class 'neurst.training.revised_dynamic_loss_scale.RevisedDynamicLossScale'> to Tensor. Contents: <neurst.training.revised_dynamic_loss_scale.RevisedDynamicLossScale object at 0x7f6b141da970>. Consider casting elements to a supported type.

the detail implementation of lightweight convolution？

Hi,

Thank you for your open source work. I am struggling with the implementation of lightweight convolution. The code of kernel in LightweightConv is written as :

neurst/neurst/layers/attentions/light_convolution_layer.py

Line 164 in 671e88f

w = tf.nn.softmax(self._conv_shared_weight)

neurst/neurst/layers/attentions/light_convolution_layer.py

Line 166 in 671e88f

 filter = tf.reshape(tf.repeat(tf.transpose(w), repeats=x_dim // self._num_heads), 

I try this example:

num_heads = 16
x_dim = 512
kernel_size = 3
_conv_shared_weight = tf.random.normal((num_heads, kernel_size))  # [H, ks]
w = tf.nn.softmax(_conv_shared_weight)   # [H, ks]
print("transposed w: ", tf.transpose(w))
filter = tf.reshape(tf.repeat(tf.transpose(w), repeats=x_dim // num_heads),
                    [kernel_size, x_dim])   # [ks, 512]
print("filter: ", filter.shape)
print("filter: ", filter[:, 20:40])

And get the result:

transposed w:  tf.Tensor(
[[0.15676415 0.14292802 0.28718808 0.13553983 0.5141512  0.25704482
  0.51052123 0.29408646 0.63065183 0.47200555 0.1983979  0.11646753
  0.29223946 0.35182765 0.5852054  0.07909278]
 [0.4588186  0.5518362  0.569815   0.13879508 0.24509494 0.58328515
  0.4027575  0.65089965 0.22106884 0.10732753 0.53596777 0.72198284
  0.15367371 0.44660234 0.19824868 0.4396185 ]
 [0.38441733 0.30523574 0.1429969  0.72566503 0.24075384 0.15966997
  0.08672129 0.05501387 0.1482793  0.42066696 0.2656343  0.16154966
  0.55408686 0.20156996 0.21654584 0.4812887 ]], shape=(3, 16), dtype=float32)
filter: [3, 512]
filter:  tf.Tensor(
[[0.15676415 0.15676415 0.15676415 0.15676415 0.15676415 0.15676415
  0.15676415 0.15676415 0.15676415 0.15676415 0.15676415 0.15676415
  0.14292802 0.14292802 0.14292802 0.14292802 0.14292802 0.14292802
  0.14292802 0.14292802]
 [0.4588186  0.4588186  0.4588186  0.4588186  0.4588186  0.4588186
  0.4588186  0.4588186  0.4588186  0.4588186  0.4588186  0.4588186
  0.5518362  0.5518362  0.5518362  0.5518362  0.5518362  0.5518362
  0.5518362  0.5518362 ]
 [0.38441733 0.38441733 0.38441733 0.38441733 0.38441733 0.38441733
  0.38441733 0.38441733 0.38441733 0.38441733 0.38441733 0.38441733
  0.30523574 0.30523574 0.30523574 0.30523574 0.30523574 0.30523574
  0.30523574 0.30523574]], shape=(3, 20), dtype=float32)

the values in the final kernel weights are strange. The first [x_dim/num_head=32] is the same, and the second 32 dimensions are the same. I'm curious if this is right？

speech_transformer ST验证阶段问题

您好，我在复现augmented_librispeech的时候出现了以下问题：
1.验证ASR效果时，我的WER为100+，离result的8差的有点远😂
因为设备的原因，我把train_steps修改为10000（原来的1/20），batch_size降低为60000（原来的1/2）
请问是不是提高train_steps就可以有效改变？（正打算重新修改参数再跑一次），还是其他的原因呢？

2.在ST训练阶段的验证，我发现不管听到的是什么，每一次的hypothesis都显示为同一句话（如图），请问是什么原因呢？

how to export the model

How to export model with the format used by lightseq and tortion?