Giter Club home page Giter Club logo

nvidia / nemo Goto Github PK

View Code? Open in Web Editor NEW
10.1K 185.0 2.1K 246.44 MB

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Home Page: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

License: Apache License 2.0

Python 71.37% Shell 0.18% Dockerfile 0.03% Jupyter Notebook 28.25% HTML 0.02% CSS 0.01% Makefile 0.01% C++ 0.12% Groovy 0.02%
machine-translation speaker-recognition asr tts generative-ai multimodal deeplearning neural-networks speaker-diariazation speech-translation

nemo's Introduction

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Documentation CodeQL NeMo core license and license for collections in this repo Release version Python version PyPi total downloads Code style: black

NVIDIA NeMo Framework

Latest News

Large Language Models and Multimodal
Accelerate your generative AI journey with NVIDIA NeMo framework on GKE (2024/03/16) An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework.

Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso (2024/03/06) Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference.

New NVIDIA NeMo Framework Features and NVIDIA H200 (2023/12/06) NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs.

H200-NeMo-performance

NVIDIA now powers training for Amazon Titan Foundation models (2023/11/28) NVIDIA NeMo framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running LLMs.

Introduction

NVIDIA NeMo Framework is a generative AI framework built for researchers and pytorch developers working on large language models (LLMs), multimodal models (MM), automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia to more easily implement and design new generative AI models by being able to leverage existing code and pretrained models.

For technical documentation, please see the NeMo Framework User Guide.

All NeMo models are trained with Lightning and training is automatically scalable to 1000s of GPUs.

When applicable, NeMo models take advantage of the latest possible distributed training techniques, including parallelism strategies such as

  • data parallelism
  • tensor parallelism
  • pipeline model parallelism
  • fully sharded data parallelism (FSDP)
  • sequence parallelism
  • context parallelism
  • mixture-of-experts (MoE)

and mixed precision training recipes with bfloat16 and FP8 training.

NeMo's Transformer based LLM and Multimodal models leverage NVIDIA Transformer Engine for FP8 training on NVIDIA Hopper GPUs and leverages NVIDIA Megatron Core for scaling transformer model training.

NeMo LLMs can be aligned with state of the art methods such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF), see NVIDIA NeMo Aligner for more details.

NeMo LLM and Multimodal models can be deployed and optimized with NVIDIA Inference Microservices (Early Access).

NeMo ASR and TTS models can be optimized for inference and deployed for production use-cases with NVIDIA Riva.

For scaling NeMo LLM and Multimodal training on Slurm clusters or public clouds, please see the NVIDIA Framework Launcher. The NeMo Framework launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and Multimodal models and also has an Autoconfigurator which can be used to find the optimal model parallel configuration for training on a specific cluster. To get started quickly with the NeMo Framework Launcher, please see the NeMo Framework Playbooks The NeMo Framework Launcher does not currently support ASR and TTS training but will soon.

Getting started with NeMo is simple. State of the Art pretrained NeMo models are freely available on HuggingFace Hub and NVIDIA NGC. These models can be used to generate text or images, transcribe audio, and synthesize speech in just a few lines of code.

We have extensive tutorials that can be run on Google Colab or with our NGC NeMo Framework Container. and we have playbooks for users that want to train NeMo models with the NeMo Framework Launcher.

For advanced users that want to train NeMo models from scratch or finetune existing NeMo models we have a full suite of example scripts that support multi-GPU/multi-node training.

Key Features

Requirements

  1. Python 3.10 or above
  2. Pytorch 1.13.1 or above
  3. NVIDIA GPU, if you intend to do model training

Developer Documentation

Version Status Description
Latest Documentation Status Documentation of the latest (i.e. main) branch.
Stable Documentation Status Documentation of the stable (i.e. most recent release) branch.

Getting help with NeMo

FAQ can be found on NeMo's Discussions board. You are welcome to ask questions or start discussions there.

Installation

The NeMo Framework can be installed in a variety of ways, depending on your needs. Depending on the domain, you may find one of the following installation methods more suitable.

  • Conda / Pip - Refer to the Conda and Pip sections for installation instructions.
    • This is recommended for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) domains.
    • When using a Nvidia PyTorch container as the base, this is the recommended installation method for all domains.
  • Docker - Refer to the Docker containers section for installation instructions.
    • This is recommended for Large Language Models (LLM), Multimodal and Vision domains.
    • NeMo LLM & Multimodal Container - nvcr.io/nvidia/nemo:24.01.01.framework
    • NeMo Speech Container - nvcr.io/nvidia/nemo:24.01.speech

Conda

We recommend installing NeMo in a fresh Conda environment.

conda create --name nemo python==3.10.12
conda activate nemo

Install PyTorch using their configurator.

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

The command used to install PyTorch may depend on your system. Please use the configurator linked above to find the right command for your system.

Pip

Use this installation mode if you want the latest released version.

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
pip install nemo_toolkit['all']

Depending on the shell used, you may need to use "nemo_toolkit[all]" instead in the above command.

Pip (Domain Specific)

To install only a specific domain of NeMo, use the following commands. Note: It is required to install the above pre-requisites before installing a specific domain of NeMo.

pip install nemo_toolkit['asr']
pip install nemo_toolkit['nlp']
pip install nemo_toolkit['tts']
pip install nemo_toolkit['vision']
pip install nemo_toolkit['multimodal']

Pip from source

Use this installation mode if you want the version from a particular GitHub branch (e.g main).

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all]

From source

Use this installation mode if you are contributing to NeMo.

apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh

If you only want the toolkit without additional conda-based dependencies, you may replace reinstall.sh with pip install -e . when your PWD is the root of the NeMo repository.

Mac computers with Apple silicon

To install NeMo on Mac with Apple M-Series GPU:

  • create a new Conda environment
  • install PyTorch 2.0 or higher
  • run the following code:
# [optional] install mecab using Homebrew, to use sacrebleu for NLP collection
# you can install Homebrew here: https://brew.sh
brew install mecab

# [optional] install pynini using Conda, to use text normalization
conda install -c conda-forge pynini

# install Cython manually
pip install cython

# clone the repo and install in development mode
git clone https://github.com/NVIDIA/NeMo
cd NeMo
pip install 'nemo_toolkit[all]'

# Note that only the ASR toolkit is guaranteed to work on MacBook - so for MacBook use pip install 'nemo_toolkit[asr]'

Windows Computers

One of the options is using Windows Subsystem for Linux (WSL).

To install WSL:

  • In PowerShell, run the following code:
wsl --install
# [note] If you run wsl --install and see the WSL help text, it means WSL is already installed.

Learn more about installing WSL at Microsoft's official documentation.

After Installing your Linux distribution with WSL:
  • Option 1: Open the distribution (Ubuntu by default) from the Start menu and follow the instructions.
  • Option 2: Launch the Terminal application. Download it from Microsoft's Windows Terminal page if not installed.

Next, follow the instructions for Linux systems, as provided above. For example:

apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh

RNNT

Note that RNNT requires numba to be installed from conda.

conda remove numba
pip uninstall numba
conda install -c conda-forge numba

Apex

NeMo LLM Domain training requires NVIDIA Apex to be installed. Install it manually if not using the NVIDIA PyTorch container.

To install Apex, run

git clone https://github.com/NVIDIA/apex.git
cd apex
git checkout b496d85fb88a801d8e680872a12822de310951fd
pip install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam" ./

It is highly recommended to use the NVIDIA PyTorch or NeMo container if having issues installing Apex or any other dependencies.

While installing Apex, it may raise an error if the CUDA version on your system does not match the CUDA version torch was compiled with. This raise can be avoided by commenting it here: https://github.com/NVIDIA/apex/blob/master/setup.py#L32

cuda-nvprof is needed to install Apex. The version should match the CUDA version that you are using:

conda install -c nvidia cuda-nvprof=11.8

packaging is also needed:

pip install packaging

With the latest versions of Apex, the pyproject.toml file in Apex may need to be deleted in order to install locally.

Transformer Engine

NeMo LLM Domain has been integrated with NVIDIA Transformer Engine Transformer Engine enables FP8 training on NVIDIA Hopper GPUs. Install it manually if not using the NVIDIA PyTorch container.

pip install --upgrade git+https://github.com/NVIDIA/TransformerEngine.git@stable

It is highly recommended to use the NVIDIA PyTorch or NeMo container if having issues installing Transformer Engine or any other dependencies.

Transformer Engine requires PyTorch to be built with CUDA 11.8.

Flash Attention

When traning Large Language Models in NeMo, users may opt to use Flash Attention for efficient training. Transformer Engine already supports Flash Attention for GPT models. If you want to use Flash Attention for non-causal models, please install flash-attn. If you want to use Flash Attention with attention bias (introduced from position encoding, e.g. Alibi), please also install triton pinned version following the implementation.

pip install flash-attn
pip install triton==2.0.0.dev20221202

NLP inference UI

To launch the inference web UI server, please install the gradio gradio.

pip install gradio==3.34.0

NeMo Text Processing

NeMo Text Processing, specifically (Inverse) Text Normalization, is now a separate repository https://github.com/NVIDIA/NeMo-text-processing.

Docker containers

We release NeMo containers alongside NeMo releases. For example, NeMo r1.23.0 comes with container nemo:24.01.speech, you may find more details about released containers in releases page.

To use built container, please run

docker pull nvcr.io/nvidia/nemo:24.01.speech

To build a nemo container with Dockerfile from a branch, please run

DOCKER_BUILDKIT=1 docker build -f Dockerfile -t nemo:latest .

If you choose to work with the main branch, we recommend using NVIDIA's PyTorch container version 23.10-py3 and then installing from GitHub.

docker run --gpus all -it --rm -v <nemo_github_folder>:/NeMo --shm-size=8g \
-p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \
stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:23.10-py3

Examples

Many examples can be found under the "Examples" folder.

Contributing

We welcome community contributions! Please refer to CONTRIBUTING.md for the process.

Publications

We provide an ever-growing list of publications that utilize the NeMo framework.

If you would like to add your own article to the list, you are welcome to do so via a pull request to this repository's gh-pages-src branch. Please refer to the instructions in the README of that branch.

License

NeMo is released under an Apache 2.0 license.

nemo's People

Contributors

anteju avatar arendu avatar blisc avatar bmwshop avatar borisfom avatar chiphuyen avatar cuichenx avatar drnikolaev avatar ekmb avatar ericharper avatar fayejf avatar github-actions[bot] avatar maximumentropy avatar michalivne avatar nithinraok avatar oktai15 avatar okuchaiev avatar redoctopus avatar rlangman avatar seannaren avatar stasbel avatar stevehuang52 avatar tango4j avatar titu1994 avatar tkornuta-nvidia avatar vahidoox avatar vsl9 avatar xuesongyang avatar yidong72 avatar yzhang123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nemo's Issues

Training non-English ASR model

Hello!
I tried to train russian ASR model based on 1_ASR_tutorial_using_NeMo.ipynb notebook (from NeMo/examples/asr/notebooks/) using Google Colab. I used config jasper_an4.yaml and quartznet5x3.yaml with changed labels accordingly to russian alphabet ("а", "б", "в", "г" and etc.) and wav files with russian speech.
During training I got empty "Reference" in every step. There was "Prediction" on the first step and then only empty rows. Loss seems to be correct but WER is infinite in this case. Is there a problem with the encoding?
I got predictions only with whitespaces on inference, again WER is infinite.

I will appreciate for any hint how to solve this issue.

Issue while installing swig

Note: commands executed inside the container

  1. While following the instructions from https://nvidia.github.io/NeMo/asr/tutorial.html#inference for using klm, got an error like "swig package not available" while running apt-get install swig.

This was fixed by running apt-get update before running apt-get install swig.

  1. Also sudo (in sudo apt-get install swig) is not required inside container.

  2. install_decoders.sh was failing with gcc error. Had to run the following to fix it:
    apt-get install ssh pkg-config libflac-dev libogg-dev libvorbis-dev libboost-dev swig python-dev git-core libsndfile1-dev python-setuptools libboost-all-dev //NOTE: I'm not sure if all the dependencies are required

TIMIT ?

Hi there !

Do you have a TIMIT recipe already existing?

Thanks for the toolkit!

ONNX Export NoneType error

Hello again, I'm trying to export quartznet15x5 v2 to ONNX with master(f946aca)

With the following command:

!python export_jasper_to_onnx.py --config quartznet15x5.yaml  \
--nn_encoder JasperEncoder-STEP-247400.pt --nn_decoder JasperDecoderForCTC-STEP-247400.pt  \
--onnx_encoder encoder.onnx --onnx_decoder decoder.onnx

Failing with:

Loading config file...
Determining model shape...
  Num encoder input features: 64
  Num decoder input features: 1024
Initializing models...
Loading checkpoints...
Exporting encoder...
2019-12-15 06:57:12,846 - WARNING - Module is JasperEncoder. We are removinginput and output length ports since they are not needed for deployment
2019-12-15 06:57:12,847 - WARNING - Turned off 0 masked convolutions
2019-12-15 06:57:12,848 - ERROR - ERROR: module export failed for JasperEncoder with exception 'NoneType' object has no attribute 'to'
Exporting decoder...
graph(%encoder_output : Float(1, 1024, 128),
      %1 : Float(29),
      %2 : Float(29, 1024, 1)):
  %3 : Float(1, 29, 128) = onnx::Conv[dilations=[1], group=1, kernel_shape=[1], pads=[0, 0], strides=[1]](%encoder_output, %2, %1), scope: JasperDecoderForCTC/Sequential[decoder_layers]/Conv1d[0] # /usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py:202:0
  %4 : Float(1, 128, 29) = onnx::Transpose[perm=[0, 2, 1]](%3), scope: JasperDecoderForCTC # /usr/local/lib/python3.6/dist-packages/nemo_asr/jasper.py:207:0
  %output : Float(1, 128, 29) = onnx::LogSoftmax[axis=2](%4), scope: JasperDecoderForCTC # /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1317:0
  return (%output)

/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py:772: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input encoder_output
  'Automatically generated names will be applied to each dynamic axes of input {}'.format(key))
/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py:772: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input output
  'Automatically generated names will be applied to each dynamic axes of input {}'.format(key))
Export completed successfully.

Is ONNX export compatible with the latest model using master?

Kaldi unittest fails

======================================================================
ERROR: test_kaldi_dataloader (tests.test_asr.TestASRPytorch)

Traceback (most recent call last):
File "/home/okuchaiev/repos/NeMo/tests/test_asr.py", line 165, in test_kaldi_dataloader
batch_size=batch_size
File "/home/okuchaiev/repos/NeMo/collections/nemo_asr/nemo_asr/data_layer.py", line 464, in init
self._dataset = KaldiFeatureDataset(**dataset_params)
File "/home/okuchaiev/repos/NeMo/collections/nemo_asr/nemo_asr/parts/dataset.py", line 224, in init
for utt_id, feats in kaldi_io.read_mat_scp(feats_path)
File "/home/okuchaiev/repos/NeMo/collections/nemo_asr/nemo_asr/parts/dataset.py", line 222, in
id2feats = {
File "/home/okuchaiev/anaconda3/envs/py37/lib/python3.7/site-packages/kaldi_io-0.9.1-py3.7.egg/kaldi_io/kaldi_io.py", line 343, in read_mat_scp
fd = open_or_fd(file_or_fd)
File "/home/okuchaiev/anaconda3/envs/py37/lib/python3.7/site-packages/kaldi_io-0.9.1-py3.7.egg/kaldi_io/kaldi_io.py", line 63, in open_or_fd
fd = open(file, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'tests/data/asr/kaldi_an4/feats.scp'

Pillow version gives error

When running an nlp model inside nemo container I received the error
ImportError: cannot import name 'PILLOW_VERSION' from 'PIL'.
The installed Pillow version was 7, after downgrading to 6 it was fine.

Quartz: issues in replicating results

Hi,

I am trying to use the NeMo implementation of Quartz to replicate the results presented in this paper.

However I am facing some issues. First of all, the pretrained encoder model has a different structure with respect to the one implemented in Nemo. In particular, to be able to load the state dictionary I had to modify Masked1DConv to inherit from 1DConv (as in the original Jasper implementation).
Moreover there are discrepancies with the names of the layers that has to be fixed to be able to load properly the pretrained model.

After my attempts at fixing these issues, I still was not able to reach the performances mentioned in the paper. I tried evaluating on dev_other, and I reached 16.9% in terms of WER, which is much higher compared to 11.58% reported on the paper.

I used the configuration file and the pretrained model that can be found here.

The validation is run inside a docker container built starting from the Dockerfile available in the repo. The only minor difference regards the version of the pytorch image used, that is the 19.09 instead of 19.11 because of some issues with CUDA drivers that wouldn't allow me to use the GPUs.

Any help would be much appreciated. Thank you!

tts_infer.py return segmentation fault

Ubuntu 18.04
Python3.6.8,
Pytorch 1.3
GPU:1080ti

I've downloaded the tacotron2 and waveglow model form ngc
tacotron2 model:https://ngc.nvidia.com/catalog/models/nvidia:tacotron2_ljspeech
waveglow model:https://ngc.nvidia.com/catalog/models/nvidia:waveglow_ljspeech

NeMo/examples/tts/tts_infer.py

And run the below command,and got segmentation fault。
python3 tts_infer.py --spec_model=tacotron2 --spec_model_config=configs/tacotron2.yaml --spec_model_load_dir=tacotron2_checkopints/ --vocoder=waveglow --vocoder_model_config=configs/waveglow.yaml --vocoder_model_load_dir=waveglow_checkopints/ --save_dir=wav_files/ --eval_dataset=test.json

Add support / option for ASR audio streaming

Currently, the ASR collection is supporting audio files as an input for training and prediction. However, a lot of ASR uses involve streaming audio for prediction. In streaming, the bytes of audio are sent in intervals. It would be useful to have an option in relevant classes (AudioToTextDataLayer and others) to accept bytes (and not just path to audiofile/files) as an input. Then, using NeMo-ASR models for streaming would be possible.

Is there a plan to add this?

jasper_inference seems very slow

Hi,
I use jasper_infer.py on my desktop and follow the tutorial.
But the inference speed seems so slow. The beam size is 100.
I use my own evaluation dataset, which has 14000 samples.
I tested the ctc decoder with language model alone and the speed seems faster.
I'm wondering how can I debug to find the problem.

Another thing is that when I use jasper_infer.py on NGC, I find there is no output on the log after showing:

2019-10-11 05:31:57,269 - WARNING - No batch_size specified in the data layer. Setting batch_size to 1.
2019-10-11 05:31:57,378 - WARNING - When constructing AudioToTextDataLayer. The base NeuralModule class received the following unused arguments:
2019-10-11 05:31:57,378 - WARNING - dict_keys(['batch_size'])
2019-10-11 05:31:59,696 - INFO - Dataset loaded with 18.09 hours. Filtered 0.00 hours.
2019-10-11 05:31:59,696 - INFO - Evaluating 14326 examples
2019-10-11 05:31:59,699 - INFO - PADDING: 16
2019-10-11 05:31:59,699 - INFO - STFT using conv
2019-10-11 05:32:45,601 - INFO - ================================
2019-10-11 05:32:45,603 - INFO - Number of parameters in encoder: 18894656
2019-10-11 05:32:45,603 - INFO - Number of parameters in decoder: 4406475
2019-10-11 05:32:45,604 - INFO - Total number of parameters in decoder: 23301131
2019-10-11 05:32:45,604 - INFO - ================================
2019-10-11 05:32:45,946 - INFO - Restoring JasperEncoder from /nemo_project/nemo_projects/aishell/checkpoint/JasperEncoder-STEP-72000.pt
2019-10-11 05:32:46,803 - INFO - Restoring JasperDecoderForCTC from /nemo_project/nemo_projects/aishell/checkpoint/JasperDecoderForCTC-STEP-72000.pt

It seems it is waiting for something ...

examples/nlp/nmt_tutorial.py: how to generate YouTokenToMe model for custom language?

Dear Team of Neural Machine Translation,

`

pass a YouTokenToMe model to YouTokenToMeTokenizer for de

if the target is zh, we should pass a vocabulary file, e.g. zh_vocab.txt

`
src: examples/nlp/nmt_tutorial.py

How to generate YouTokenToMe model for custom langueage?
I'll be so appreciated, if you provide any instruction for it.

Thank you in advance!

No module named 'nemo_nlp.utils'

Same issue as #84 , tried the recommended solution and it did not work.

I installed NeMo with:

pip install nemo_toolkit nemo_asr nemo_nlp

Running the following Python import code, I get an error

import torch
print("PyTorch Version:", torch.__version__)
import nemo
print("NeMo Version:", nemo.__version__)
import nemo_nlp
print("NeMo NLP Version:", nemo_nlp.__version__)

Output:

PyTorch Version: 1.2.0
NeMo Version: 0.8.1
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-9d2b1d1a62b9> in <module>()
      5 print("NeMo Version:", nemo.__version__)
      6 
----> 7 import nemo_nlp
      8 print("NeMo NLP Version:", nemo_nlp.__version__)

/usr/local/lib/python3.6/dist-packages/nemo_nlp/data/datasets/utils.py in <module>()
     17 from nemo.utils.exp_logging import get_logger
     18 
---> 19 from ...utils.nlp_utils import (get_vocab,
     20                                 write_vocab,
     21                                 write_vocab_in_order,

ModuleNotFoundError: No module named 'nemo_nlp.utils'

ASR Tutorial Notebook: RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).`

Getting following error when running "examples/asr/notebooks/1_ASR_tutorial_using_NeMo.ipynb" on Google Colab with GPU runtime

"RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).`"

2019-10-27 04:10:44,396 - WARNING - Data Layer does not have any weights to return. This get_weights call returns None.
2019-10-27 04:10:44,408 - INFO - Restoring checkpoint from folder ./an4_checkpoints ...
Selected optimization level O0: Pure FP32 training.

Defaults for this optimization level are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",)
Starting .....
No file matches in ./an4_checkpoints
Checkpoint folder ./an4_checkpoints present but did not restore
Starting epoch 0

RuntimeError Traceback (most recent call last)
in ()
4 optimizer='novograd',
5 optimization_params={
----> 6 "num_epochs": 150, "lr": 0.01, "weight_decay": 1e-4
7 })
8

4 frames
/usr/local/lib/python3.6/dist-packages/nemo/core/neural_factory.py in train(self, tensors_to_optimize, optimizer, optimization_params, callbacks, lr_policy, batches_per_step, stop_on_nan_loss, reset)
516 lr_policy=lr_policy,
517 batches_per_step=batches_per_step,
--> 518 stop_on_nan_loss=stop_on_nan_loss)
519
520 def eval(self,

/usr/local/lib/python3.6/dist-packages/nemo/backends/pytorch/actions.py in train(self, tensors_to_optimize, optimizer, optimization_params, callbacks, lr_policy, batches_per_step, stop_on_nan_loss)
1191 continue
1192 scaled_loss.backward(
-> 1193 bps_scale.to(scaled_loss.get_device()))
1194 else:
1195 final_loss.backward(

/usr/local/lib/python3.6/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
148 products. Defaults to False.
149 """
--> 150 torch.autograd.backward(self, gradient, retain_graph, create_graph)
151
152 def register_hook(self, hook):

/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
91 grad_tensors = list(grad_tensors)
92
---> 93 grad_tensors = _make_grads(tensors, grad_tensors)
94 if retain_graph is None:
95 retain_graph = create_graph

/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py in _make_grads(outputs, grads)
27 + str(grad.shape) + " and output["
28 + str(outputs.index(out)) + "] has a shape of "
---> 29 + str(out.shape) + ".")
30 new_grads.append(grad)
31 elif grad is None:

RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).`

Sentence classification needs a more abstract DataDesc class

Currently, the sentence classification task uses SentenceClassificationDataDesc to prepare training and testing data. If the user wants to use their own dataset. They need to add a process_xxx function and set num_class in SentenceClassificationDataDesc. That's not convenient because users have to change the source code of NeMo. Users should be able to define their own process_xxx function in training script and pass the function & num_class to SentenceClassificationDataDesc as parameters. SentenceClassificationDataDesc should use the user-defined process function to process data.

Question on multi_gpu

Hi !

Reading the documentation I should " First set placement to nemo.core.DeviceType.AllGpu in NeuralModuleFactory and in your Neural Modules" to enable multi-gpu training. But I'm definitely not sure about what module should has or should not has the placement. Do you have an example of a quartznet.py script that enables multi-gpu?

Dockerfile onnx-tensorrt patch failing

I am trying to use your Dockerfile but running into issues with your patch

Step 8/15 : RUN git clone https://github.com/onnx/onnx-tensorrt.git && cd onnx-tensorrt && git submodule update --init --recursive && patch -f < ../onnx-trt.patch &&     mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DGPU_ARCHS="60 70 75" && make -j16 && make install && mv -f /usr/lib/libnvonnx* /usr/lib/x86_64-linux-gnu/ && ldconfig
 ---> Running in f15383978bfd
Cloning into 'onnx-tensorrt'...
Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx'
Cloning into '/tmp/onnx-trt/onnx-tensorrt/third_party/onnx'...
Submodule path 'third_party/onnx': checked out '553df22c67bee5f0fe6599cff60f1afc6748c635'
Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx/third_party/benchmark'
Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11'
Cloning into '/tmp/onnx-trt/onnx-tensorrt/third_party/onnx/third_party/benchmark'...
Cloning into '/tmp/onnx-trt/onnx-tensorrt/third_party/onnx/third_party/pybind11'...
Submodule path 'third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508'
Submodule path 'third_party/onnx/third_party/pybind11': checked out '09f082940113661256310e3f4811aa7261a9fa05'
Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/onnx/third_party/pybind11/tools/clang'
Cloning into '/tmp/onnx-trt/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'...
Submodule path 'third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
patching file CMakeLists.txt
Hunk #1 FAILED at 20.
1 out of 1 hunk FAILED -- saving rejects to file CMakeLists.txt.rej
The command '/bin/sh -c git clone https://github.com/onnx/onnx-tensorrt.git && cd onnx-tensorrt && git submodule update --init --recursive && patch -f < ../onnx-trt.patch &&     mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DGPU_ARCHS="60 70 75" && make -j16 && make install && mv -f /usr/lib/libnvonnx* /usr/lib/x86_64-linux-gnu/ && ldconfig' returned a non-zero code: 1
ERROR: Job failed: command terminated with exit code 1

Mandarin ASR, predicitions stay as BLANK sequences

Hi, appreciate this great framework and your great work!
Your pretrained Mandarin quartznet has very good performance on Aishell Testset, so I want to train the same model arch on our own Mandarin reading-style data from scratch;
The train script is like this:
python -m torch.distributed.launch --nproc_per_node=2 ./jasper_aishell.py --batch_size=8 --num_epochs=150 --lr=0.00005 --warmup_steps=1000 --weight_decay=0.00001 --train_dataset=./word_4000h/lists/train.json --eval_datasets ./word_4000h/lists/dev_small.json --model_config=./aishell2_quartznet15x5/quartznet15x5.yaml --exp_name=quartznet_train --vocab_file=./word_4000h/am/token_dev_train_4400.txt --checkpoint_dir=$checkpoint_dir --work_dir=$checkpoint_dir
The training data is about 500 hours long.
At first, the prediction is pretty much random;Then after several thousand iterations(before warmup ends), the predicitions stays as BLANK sequences for two epochs like this:
Step: 4650
2020-01-07 09:53:20,694 - INFO - Loss: 110.91824340820312
2020-01-07 09:53:20,694 - INFO - training_batch_CER: 100.00%
2020-01-07 09:53:20,694 - INFO - Prediction:
2020-01-07 09:53:20,694 - INFO - Reference: 提起华华家的事情村民们声声长叹
Step time: 0.39273500442504883 seconds

I have tried the learning rate from 0.1 to 0.00005, warmpup steps from 1000 to 8000, batch size as 4,8,16,32, weight_decay from 0.001 to 0.00001, and none of those combinations could solve this problem.
Have you ever encountered this kind of problem?

How about supporting BPE in ASR

I find many papers using BPE as modelling units, so
I was wondering if it is possible to change current char-based ASR to tokenizer based (like nlp).
By using a custom tokenizer, it may help to reuse tokenizers in NLP collection. (Maybe we should put some common tools into one utils or collections, like the models and utility functions).
And users can use whatever modelling units they want.

Potential problems:

  1. Currently every example script is char based, including helpers.py.
  2. The beam search decoder now only supports char based models. (It requires a vocab file.)
    The ctc_beam_search_with_lm works like this:
    First, it will detect the language model used is char-based or word based by simply checking the length of the unigrams. If there is one unigram whose length is greater than 1, then it is word based, otherwise it is char based.

On ngram language model built on words like english LM, it will detect whether the current character is space. If it is a space, it will call language model function to detect the LM score.
On ngram LM built on charaters like some mandarin LM, it will call everytime it appends a new character (a new prefix generated).
In order to make beam search with LM available for all modelling units, we may need to change the decoder code to let it support this feature.

Of course, if we only want to use greedy search on different modelling units or only use beam search on char-based model, there will be no problem.

Correction in the 1_ASR_tutorial_using_NeMo.ipynb

noticed a tiny error in the example jupyter notebook

metadata = { "audio_filename": audio_path, "duration": duration, "text": transcript }
should be

metadata = { "audio_filepath": audio_path, "duration": duration, "text": transcript }

otherwise it causes a key error

Is it possible to export to onnx format?

How would one go in exporting a model to onnx format? I guess its not supported out of the box, but are there any hints on how to do it since it is probably one module ?

neural factory infer function return all tensors for the whole dataset

Not sure if my understanding for this function is correct.
It seems this function returns the list of batch tensors for the total dataset, which may cause CPU or GPU OOM if the dataset is huge or the tensors contain a lot of data, like the likelihood for each frames with a large vocabulary.

Is this possible to modify this function to return a generator instead ?

Unidecode module error when unit-testing

After installing dependencies, including apex then the reinstall.sh script, but upon trying out the unit tests, I received an error for missing module unidecode. A simple pip install unidecode solved this, with all tests subsequently running successfully. The error prior to successful tests is attached, and computer details for completeness below.

Is this issue something to be ameliorated in the install procedure or setup file?

CentOS Linux release 7.5.1804 (Core)
Linux version 3.10.0-862.14.4.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Wed Sep 26 15:12:11 UTC 2018

Unidecode_Error.txt

Drop last layer

I want to test the English checkpoint in a new alphabet for Spanish, is it possible to drop layers to train in a new alphabet?

nemo_nlp.utils not found

Hi,

After following the instructions for pre-training and installation, I get the following error:

nemo_nlp.utils not found

There are following two cases:

  1. Installing nemo_nlp with pip install nemo_nlp doesn't create utils directory in "lib/python3.6/site-packages"

  2. Installing nemo_nlp by cloning the git link and then running setup.py creates the utils directory but it still says "nemo_nlp.utils not found"

Thanks!

Combining Nemo with Pre-existing Classification Models

Hello! Great work here. We'd like to know if you could guide us toward the proper resources concerning how to combine NeMo with pre-existing Image Classification or even Object Detection models. Thank you for your time.

Quartznet: replicating Training

Hi,
I would like to replicate the training process of quartznet, however I'm having some issues.

In particular, on the official paper and on website it says that a speed perturbation was applied to the dataset (±10%). However I can't seem to find trace of that neither in the download script, nor in the code.

The other issue I'm having is related to the hyperparameters used when training on Librispeech, such as learning rate and warmup steps. They are not mentioned explicitly anywhere, so I was wondering if you could help me with that as well.

Thank you!

TypeError: Can't instantiate abstract class AudioToTextDataLayer with abstract methods create_ports

Hi, I installed NeMo by cloning the repo and following the installation orders. However when i tried to run the ASR example notebook I got this error:

File "examples/asr/jasper_eval.py", line 96, in main
**eval_dl_params)

Create the Jasper_4x1 encoder as specified, and a CTC decoder

---> 23 encoder = nemo_asr.JasperEncoder(**params['JasperEncoder'])
24
25 decoder = nemo_asr.JasperDecoderForCTC(

TypeError: Can't instantiate abstract class JasperEncoder with abstract methods create_ports

I also get a similar error when trying to evaluate a quartznet on the Librispeech dev:

TypeError: Can't instantiate abstract class AudioToTextDataLayer with abstract methods create_ports

Did I miss something? Thanks in advance.

Running on CPU

Hi,

I am currently trying to run 'simplest_example.py' on a CPU within a docker container.

I have tried modifying the code to run on CPU by passing:

  • "placement=DeviceType.CPU" to the Factory which produces an Error regarding CUDA:

Traceback (most recent call last):
File "simplest_example.py", line 27, in
optimizer="sgd")
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/core/neural_factory.py", line 526, in train
stop_on_nan_loss=stop_on_nan_loss)
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 1022, in train
'amp_min_loss_scale', 1.0))
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 359, in __initialize_amp
opt_level=AmpOptimizations[optim_level],
File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 170, in _initialize
check_params_fp32(models)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 92, in check_params_fp32
name, param.type()))
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_amp_state.py", line 32, in warn_or_err
raise RuntimeError(msg)
RuntimeError: Found param fc1.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

To fix that issue I additionally passed:

  • 'optimization_level=1' to prevent APEX from being called which returned

2019-10-11 09:32:10,688 - WARNING - Data Layer does not have any weights to return. This get_weights call returns None.
Starting .....
Starting epoch 0
Traceback (most recent call last):
File "simplest_example.py", line 27, in
optimizer="sgd")
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/core/neural_factory.py", line 526, in train
stop_on_nan_loss=stop_on_nan_loss)
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 1184, in train
final_loss.get_device()))
RuntimeError: Device index must not be negative

How do I run the example on CPU? Thanks.

Experience with small dataset

Hi there!

Just to gather some of your experience on working with small datasets. I'm currently investigating TIMIT with Nemo, and based on the AN4 architecture with the proper CTC symbols (Phonemes instead of character). Unfortunately, I observe poor performances (around 27% of WER) which is very high compared to works like:
https://arxiv.org/pdf/1701.02720.pdf

I also know that the PER reported at training time is based on a greedy decoding that might explain these performances.

Since my goal is to compare NeMo to others toolkits, I would like to be as fair as possible and ask you about specific tricks you have encountered to obtain better performances with smaller datasets. I'm currently trying to play with the architecture to reduce the complexity and overfitting (also noises are not great with TIMIT).

Thanks!

Fix documentation for the fine-tuning for multi-gpu training

In Fine-Tuning section of tutorial, add that in case of distributed training the restore syntax has to be the following:
jasper_encoder.restore_from("/data/atc_tenant/Speech2/nemodata/JasperEncoder-
jasper_decoder.restore_from("/data/atc_tenant/Speech2/nemodata/JasperDecoderForCTC-STEP-247400.pt", args.local_rank)

Predict Proba from Inference

Hello,
I would like to get the predict_proba from neural_factory.infer()

Example
{
predict : "Hello World"
predict_proba : 0.85
}

Thanks

How to use pretrained models ?

I download the pretrained models Aishell2 Jasper 10x5dr and QuartzNet15x5. There is a error when I use it :

NeMo-master/examples/asr$ python3 jasper_aishell_infer.py --eval_datasets ../../data/test.json --vocab_file aishell2_quartznet15x5/vocab.txt
2019-12-30 15:11:43,318 - INFO - Dataset loaded with 0.01 hours. Filtered 0.00 hours.
2019-12-30 15:11:43,318 - INFO - Evaluating 10 examples
2019-12-30 15:11:43,319 - INFO - PADDING: 16
2019-12-30 15:11:43,319 - INFO - STFT using conv
a 12
2019-12-30 15:11:48,699 - INFO - ================================
2019-12-30 15:11:48,700 - INFO - Number of parameters in encoder: 332602624
2019-12-30 15:11:48,700 - INFO - Number of parameters in decoder: 5337175
2019-12-30 15:11:48,701 - INFO - Total number of parameters in decoder: 337939799
2019-12-30 15:11:48,701 - INFO - ================================
2019-12-30 15:11:48,704 - INFO - Restoring JasperEncoder from ./aishell2_jasper10x5dr/JasperEncoder-STEP-394050.pt
Traceback (most recent call last):
  File "jasper_aishell_infer.py", line 260, in <module>
    main()
  File "jasper_aishell_infer.py", line 212, in main
    checkpoint_dir=load_dir,
  File "/usr/local/lib/python3.6/dist-packages/nemo/core/neural_factory.py", line 687, in infer
    modules_to_restore=modules_to_restore)
  File "/usr/local/lib/python3.6/dist-packages/nemo/backends/pytorch/actions.py", line 1545, in infer
    mod.restore_from(checkpoint, self._local_rank)
  File "/usr/local/lib/python3.6/dist-packages/nemo/backends/pytorch/nm.py", line 111, in restore_from
    self.load_state_dict(t.load(path, map_location=load_device))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for JasperEncoder:
	Missing key(s) in state_dict: "encoder.0.mconv.0.conv.weight", "encoder.0.mconv.1.weight", "encoder.0.mconv.1.bias", "encoder.0.mconv.1.running_mean", "encoder.0.mconv.1.running_var", "encoder.1.mconv.0.conv.weight", "encoder.1.mconv.1.weight", "encoder.1.mconv.1.bias", "encoder.1.mconv.1.running_mean", "encoder.1.mconv.1.running_var", "encoder.1.mconv.4.conv.weight", "encoder.1.mconv.5.weight", "encoder.1.mconv.5.bias", "encoder.1.mconv.5.running_mean", "encoder.1.mconv.5.running_var", "encoder.1.mconv.8.conv.weight", "encoder.1.mconv.9.weight", "encoder.1.mconv.9.bias", "encoder.1.mconv.9.running_mean", "encoder.1.mconv.9.running_var", "encoder.1.mconv.12.conv.weight", "encoder.1.mconv.13.weight", "encoder.1.mconv.13.bias", .......

Some parameter I set:

parser.add_argument("--model_config",default="./aishell2_jasper10x5dr/jasper10x5dr.yaml", type=str)
parser.add_argument("--load_dir",default='./aishell2_jasper10x5dr/', type=str)

Is it possible to check quality of tacotron2 training without/before waveglow?

It is my fist attempt to use tacotron2 training. I am trying to synthesis voice for none English language. I am not sure that the my dataset and training options are absolutely correct.
On the other side my hardware is not so powerful to wait calmly.
So I would like to check preliminary results to make required changes, if they are needed.

Today is a 4-rd day of the training.
The epoch is a 10.
Step: 48000
Loss: between 0.40 - 0.75

Is this level enough to reach rough results (wav, png)?
I had tried to run tts_infer.py with defaults options, but I got warning that there is no waveglow (that might have been expected)․

Thank you in advance

examples/nlp/ner.py isn't working

I've just sent a pull request to fix the issue. The model is training and achieves high accuracy and F1 score.
But I still see the following warning:
'WARNING - Data Layer does not have any weights to return. This get_weights call returns None.'

Which labels are required to use in tacotron2.yaml configuration for none English languages [TTS]?

Dear Team,

I may suppose that the original letters and characters in UTF8 encoding are required in labels of the tacotron configuration file for none English languages, but as I see in your sample for Chinese Mandarin language (tacotron2_mandarin.yaml) the latin characteres are used.

labels: [' ', '!', ',', '.', '?', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4']

Is it a right option?

ERROR: module export failed for JasperEncoder with exception number of output names provided (2) exceeded number of outputs (1)

Hello, I tried to trained my own mandarin ASR model with open corpus aishell_1, everything seems right, the config file I used is located in examples/asr/configs/quartznet10x5.yaml, but when I attempted to convert temporary JasperEncoder-STEP-30000.pt and JasperDecoderForCTC-STEP-30000.pt to onnx format by using scripts/export_jasper_to_onnx.py script, An error occured when converting encoder pt file to onnx format, some logs are:

Loading config file...
Determining model shape...
Num encoder input features: 64
Num decoder input features: 1024
Initializing models...
Loading checkpoints...
Exporting encoder...
2020-01-07 16:07:16,987 - WARNING - Turned off 115 masked convolutions
Module is JasperEncoder. We are removinginput and output length ports since they are not needed for deployment
/xxx/anaconda3/lib/python3.7/site-packages/torch/jit/init.py:1007: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:
Not within tolerance rtol=1e-05 atol=1e-05 at input[0, 305, 3] (0.005420095752924681 vs. 0.005409650504589081) and 1 other locations (0.00%)
check_tolerance, _force_outplace, True, _module_class)
2020-01-07 16:07:24,303 - ERROR - ERROR: module export failed for JasperEncoder with exception number of output names provided (2) exceeded number of outputs (1)

After my own check and trace, I think there may be a bug in nemo.backends.pytorch.actions.py

input_names=input_names,
output_names=output_names,

after I removed "length" from list input_names and removed "encoded_lengths" from list output_names before calling torch.onnx.export, the converting process worked fine.

The nemo version I used is 0.9.0

jasper_infer.py causes GPU memory OOM

Using the script jasper_infer.py according to the tutorial, I find the GPU memory seems not released after each batch and it increases after each batch until OOM.
model_config=/workspace/nemo/examples/asr/configs/jasper10x5dr.yaml
CUDA: 10.1
Tesla V100

Pre-trained models are no longer compatible with new model architecture for ASR

The pre-trained models have effectively become not usable anymore since updates were made to JasperEncoder or most probable the jasper.py module.

Example pre-trained model: https://ngc.nvidia.com/catalog/models/nvidia:quartznet15x5

Error on trying to load the same:

jasper_encoder = nemo_asr.JasperEncoder(
    jasper=jasper_model_definition['JasperEncoder']['jasper'],
    activation=jasper_model_definition['JasperEncoder']['activation'],
    feat_in=jasper_model_definition['AudioToMelSpectrogramPreprocessor']['features'])

jasper_encoder.restore_from(CHECKPOINT_ENCODER, local_rank=0)
RuntimeError: Error(s) in loading state_dict for JasperEncoder:
	Missing key(s) in state_dict: "encoder.0.mconv.0.conv.weight", "encoder.0.mconv.1.conv.weight", "encoder.0.mconv.2.weight", "encoder.0.mconv.2.bias", "encoder.0.mconv.2.running_mean", "encoder.0.mconv.2.running_var", "encoder.1.mconv.0.conv.weight", "encoder.1.mconv.1.conv.weight", "encoder.1.mconv.2.weight", "encoder.1.mconv.2.bias", "encoder.1.mconv.2.running_mean", "encoder.1.mconv.2.running_var", "encoder.1.mconv.5.conv.weight", "encoder.1.mconv.6.conv.weight", "encoder.1.mconv.7.weight", "encoder.1.mconv.7.bias", "encoder.1.mconv.7.running_mean", "encoder.1.mconv.7.running_var", "encoder.1.mconv.10.conv.weight", "encoder.1.mconv.11.conv.weight", "encoder.1.mconv.12.weight", "encoder.1.mconv.12.bias", "encoder.1.mconv.12.running_mean", "encoder.1.mconv.12.running_var", "encoder.1.mconv.15.conv.weight", "encoder.1.mconv.16.conv.weight", "encoder.1.mconv.17.weight", "encoder.1.mconv.17.bias", "encoder.1.mconv.17.running_mean", "encoder.1.mconv.17.running_var", "encoder.1.mconv.20.conv.weight", "encoder.1.mconv.21.conv.weight", "encoder.1.mconv.22.weight", "encoder.1.mconv.22.bias", "encoder.1.mconv.22.running_mean", "encoder.1.mconv.22.running_var", "encoder.1.res.0.0.conv.weight", "encoder.2.mconv.0.conv.weight", "encoder.2.mconv.1.conv.weight", "encoder.2.mconv.2.weight", "encoder.2.mconv.2.bias", "encoder.2.mconv.2.running_mean", "encoder.2.mconv.2.running_var", "encoder.2.mconv.5.conv.weight", "encoder.2.mconv.6.conv.weight", "encoder.2.mconv.7.weight", "encoder.2.m...
	Unexpected key(s) in state_dict: "encoder.0.conv.0.weight", "encoder.0.conv.1.weight", "encoder.0.conv.2.weight", "encoder.0.conv.2.bias", "encoder.0.conv.2.running_mean", "encoder.0.conv.2.running_var", "encoder.0.conv.2.num_batches_tracked", "encoder.1.conv.0.weight", "encoder.1.conv.1.weight", "encoder.1.conv.2.weight", "encoder.1.conv.2.bias", "encoder.1.conv.2.running_mean", "encoder.1.conv.2.running_var", "encoder.1.conv.2.num_batches_tracked", "encoder.1.conv.5.weight", "encoder.1.conv.6.weight", "encoder.1.conv.7.weight", "encoder.1.conv.7.bias", "encoder.1.conv.7.running_mean", "encoder.1.conv.7.running_var", "encoder.1.conv.7.num_batches_tracked", "encoder.1.conv.10.weight", "encoder.1.conv.11.weight", "encoder.1.conv.12.weight", "encoder.1.conv.12.bias", "encoder.1.conv.12.running_mean", "encoder.1.conv.12.running_var", "encoder.1.conv.12.num_batches_tracked", "encoder.1.conv.15.weight", "encoder.1.conv.16.weight", "encoder.1.conv.17.weight", "encoder.1.conv.17.bias", "encoder.1.conv.17.running_mean", "encoder.1.conv.17.running_var", "encoder.1.conv.17.num_batches_tracked", "encoder.1.conv.20.weight", "encoder.1.conv.21.weight", "encoder.1.conv.22.weight", "encoder.1.conv.22.bias", "encoder.1.conv.22.running_mean", "encoder.1.conv.22.running_var", "encoder.1.conv.22.num_batches_tracked", "encoder.1.res.0.0.weight", "encoder.2.conv.0.weight", "encoder.2.conv.1.weight", "encoder.2.conv.2.weight", "encoder.2.conv.2.bias", "encoder.2.conv.2.running_mean", "encoder....

Is there anyway to make this work with the older models or get newer compatible pre-trained models? @okuchaiev

KeyError: 'EvalLoss'

I am getting this EvalLoss key error when trying to do a training/validation run using the ASR tutorial. See the command and the output below. Training seems to work ok but not the evaluation step.

The same error occurs when trying to follow the notebook in the ASR tutorial. Any suggestions on how to fix this? I am using the Docker container that I pulled with this:

docker pull nvcr.io/nvidia/nemo:v0.9

I made sure I had the latest nemo toolkit and nemo asr modules by running pip install.

================================
Here is the terminal output:

root@9bb9ab3869fc:/workspace/nemo_examples/asr# python -m torch.distributed.launch --nproc_per_node=2 /workspace/nemo_examples/asr/jasper.py --batch_size=64 --num_epochs=100 --lr=0.015 --warmup_steps=8000 --weight_decay=0.001 --train_dataset=/home/pakh0002/data/train-manifests/an4_train_manifest.json --eval_datasets /home/pakh0002/data/test-manifests/an4_test_manifest.json --model_config=/workspace/nemo_examples/asr/configs/quartznet15x5.yaml --exp_name=MyLARGE-ASR-EXPERIMENT


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


/opt/conda/lib/python3.6/site-packages/torchvision/io/_video_opt.py:17: UserWarning: video reader based on ffmpeg c++ ops not available
warnings.warn("video reader based on ffmpeg c++ ops not available")
/opt/conda/lib/python3.6/site-packages/torchvision/io/_video_opt.py:17: UserWarning: video reader based on ffmpeg c++ ops not available
warnings.warn("video reader based on ffmpeg c++ ops not available")
Could not import torchaudio. Some features might not work.
Could not import torchaudio. Some features might not work.
2019-12-17 18:39:51,009 - INFO - Doing ALL GPU
2019-12-17 18:39:51,300 - INFO - Dataset loaded with 0.71 hours. Filtered 0.00 hours.
2019-12-17 18:39:51,300 - INFO - Parallelizing DATALAYER
2019-12-17 18:39:51,300 - INFO - Have 948 examples to train on.
2019-12-17 18:39:51,301 - INFO - PADDING: 16
2019-12-17 18:39:51,301 - INFO - STFT using conv
2019-12-17 18:39:51,382 - INFO - Dataset loaded with 0.00 hours. Filtered 0.00 hours.
2019-12-17 18:39:51,382 - INFO - Parallelizing DATALAYER
2019-12-17 18:39:51,752 - INFO - ================================
2019-12-17 18:39:51,754 - INFO - Number of parameters in encoder: 18894656
2019-12-17 18:39:51,754 - INFO - Number of parameters in decoder: 29725
2019-12-17 18:39:51,756 - INFO - Total number of parameters in decoder: 18924381
2019-12-17 18:39:51,756 - INFO - ================================
2019-12-17 18:39:51,824 - WARNING - Data Layer does not have any weights to return. This get_weights call returns None.
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
2019-12-17 18:39:51,835 - INFO - Doing distributed training
2019-12-17 18:39:51,858 - INFO - Starting .....
2019-12-17 18:39:51,862 - INFO - Found 2 modules with weights:
2019-12-17 18:39:51,862 - INFO - JasperDecoderForCTC
2019-12-17 18:39:51,862 - INFO - JasperEncoder
2019-12-17 18:39:51,862 - INFO - Total model parameters: 18924381
2019-12-17 18:39:51,862 - INFO - Restoring checkpoint from folder MyLARGE-ASR-EXPERIMENT-lr_0.015-bs_64-e_100-wd_0.001-opt_novograd-ips_1/checkpoints ...
2019-12-17 18:39:51,864 - WARNING - For module JasperEncoder, no file matches in MyLARGE-ASR-EXPERIMENT-lr_0.015-bs_64-e_100-wd_0.001-opt_novograd-ips_1/checkpoints
2019-12-17 18:39:51,864 - WARNING - Checkpoint folder MyLARGE-ASR-EXPERIMENT-lr_0.015-bs_64-e_100-wd_0.001-opt_novograd-ips_1/checkpoints present but did not restore
2019-12-17 18:39:51,864 - INFO - Starting epoch 0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
2019-12-17 18:39:56,417 - INFO - Step: 0
2019-12-17 18:39:56,421 - INFO - Loss: 380.2557373046875
2019-12-17 18:39:56,421 - INFO - training_batch_WER: inf%
2019-12-17 18:39:56,422 - INFO - Prediction: ZJWJBITBIZBWBSJWBJZBJQZQBJB BZQVBBQBPBPWIPBNGBYPWQBQBDBDQBWBPBWWBZBIQBBN
2019-12-17 18:39:56,422 - INFO - Reference:
2019-12-17 18:39:56,422 - INFO - Step time: 2.671168327331543 seconds
2019-12-17 18:39:56,422 - INFO - Doing Evaluation ..............................
Traceback (most recent call last):
File "/workspace/nemo_examples/asr/jasper.py", line 309, in
main()
File "/workspace/nemo_examples/asr/jasper.py", line 305, in main
batches_per_step=args.iter_per_step)
File "/opt/conda/lib/python3.6/site-packages/nemo/core/neural_factory.py", line 616, in train
gradient_predivide=gradient_predivide)
File "/opt/conda/lib/python3.6/site-packages/nemo/backends/pytorch/actions.py", line 1512, in train
self._perform_on_iteration_end(callbacks=callbacks)
File "/opt/conda/lib/python3.6/site-packages/nemo/core/neural_factory.py", line 198, in _perform_on_iteration_end
callback.on_iteration_end()
File "/opt/conda/lib/python3.6/site-packages/nemo/core/callbacks.py", line 435, in on_iteration_end
self.action._eval(self._eval_tensors, self, step)
File "/opt/conda/lib/python3.6/site-packages/nemo/backends/pytorch/actions.py", line 709, in _eval
callback._global_var_dict)
File "/opt/conda/lib/python3.6/site-packages/nemo_asr/helpers.py", line 154, in process_evaluation_epoch
eloss = torch.mean(torch.stack(global_vars['EvalLoss'])).item()
KeyError: 'EvalLoss'
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', '/workspace/nemo_examples/asr/jasper.py', '--local_rank=1', '--batch_size=64', '--num_epochs=100', '--lr=0.015', '--warmup_steps=8000', '--weight_decay=0.001', '--train_dataset=/home/pakh0002/data/train-manifests/an4_train_manifest.json', '--eval_datasets', '/home/pakh0002/data/test-manifests/an4_test_manifest.json', '--model_config=/workspace/nemo_examples/asr/configs/quartznet15x5.yaml', '--exp_name=MyLARGE-ASR-EXPERIMENT']' returned non-zero exit status 1.

unidecode requirement

You use the unidecode module, but it is not listed as a requirement so it doesn't get installed if one doesn't have it.

CUDA out of memory

Hi, thank you for making a excellent project.
I have a question about training big model. I can train jasper12x1SEP on a 1080Ti GPU. I can't train jasper15x5SEP on two 2080Ti GPU , which error is CUDA out of memory. How can I use two 2080Ti train jasper15x5SEP.
parameter:

        num_epochs=50,
        batch_size=32,
        eval_batch_size=16,
        lr=0.015,
        weight_decay=0.001,
        warmup_steps=8000,
        checkpoint_save_freq=2000,
        train_eval_freq=100,
        eval_freq=4000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.