dangvansam / viet-asr Goto Github PK

VietASR - Vietnamese Automatic Speech Recognition

Home Page: https://github.com/dangvansam98/viet-asr

License: Apache License 2.0

Python 99.16% HTML 0.84%

speech-recognition automatic-speech-recognition asr vietnamese-speech-recognition vietnamese-nlp vietnamese-language ctc-loss ctc-decode speech-to-text stt

viet-asr's Introduction

VietASR (Vietnamese Automatic Speech Recognition)

⚡ Some experiment with NeMo ⚡

Model: QuartzNet is a smaller version of Jaser model

The pretrained model on this repo was trained with ~100 hours Vietnamese speech dataset, was collected from youtube, radio, call center(8k), text to speech data and some public dataset (vlsp, vivos, fpt). It is very small model (13M parameters) make it inference so fast ⚡

🌱 Update: The new version available on branch v2.0 is built from scratch with PyTorch

Installation

Update & install linux libs:

apt-get update && apt-get install -y libsndfile1 ffmpeg

Install python>=3.8

Python libs:

pip install -r requirements.txt

Install torch 1.8.1:

# cpu only, you can install CUDA version if you have NVidia GPU
pip install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

Install kemlm for LM decoding (only support Linux)

pip install https://github.com/kpu/kenlm/archive/master.zip

Transcribe audio file

python infer.py audio_samples # will transcribe audio file in folder: audio_samples

Run web application

Run app:

python app.py # app will run on address: https://localhost:5000

Video demo on Youtube:
- v1: https://youtu.be/P3mhEngL1us
- v2: https://youtu.be/o9NpWi3VUHs

TODO

Conformer Model
Data augumentation: speed, noise, pitch shift, time shift,...
FastAPI
Add Dockerfile

Citation

  @article{kuchaiev2019nemo,
    title={Nemo: a toolkit for building ai applications using neural modules},
    author={Kuchaiev, Oleksii and Li, Jason and Nguyen, Huyen and Hrinchuk, Oleksii and Leary, Ryan and Ginsburg, Boris and Kriman, Samuel and Beliaev, Stanislav and Lavrukhin, Vitaly and Cook, Jack and others},
    journal={arXiv preprint arXiv:1909.09577},
    year={2019}
  }

viet-asr's People

Contributors

Stargazers

Watchers

viet-asr's Issues

Requirement.txt

Hi ban,

Ban co the cung cap file requirement.txt khong?

Best reagrds,

PeterPham

Would you consider adding the MIT license?

Hi,

I find your project very helpful. Would you consider making it open to the community? Would you consider changing the license to something like MIT?

Best regards,
detrin

Unigrams

Mình chạy đc rồi nhưng nó hiện ntn. Bạn biết tại sao ko ạ.
Unigrams not provided and cannot be automatically determined from LM file (only arpa format). Decoding accuracy might be reduced.

Mô hình

Cho mình hỏi chút, bạn đang dùng mô hình jasper của Nvidia à? demo này train bao nhiêu giờ dữ liệu và wer đang bao nhiêu vậy bạn?
Thanks!

No module named 'g2pNp2g_simple'

Khi thực hiện run repos, tôi đã gặp một lỗi No module named 'g2pNp2g_simple'
Tôi phải làm như thế nào để giải quyết vấn đề này đây.

Lỗi với librosa !

Sau khi cài đặt, tôi chạy thử vào báo lỗi với librosa như ở dưới, tôi đang dùng librosa phiến bản 0.10.1

Cảm ơn bạn,
Tuấn

root@voice-dev-01:/opt/viet-asr# python3 infer.py audio_samples
################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

2023-08-20 08:26:24.098 | INFO | main::179 - transcribe audio file in : audio_samples
2023-08-20 08:26:24.098 | INFO | main:init:74 - Init VietASR with params:
2023-08-20 08:26:24.098 | INFO | main:init:75 - ========================
2023-08-20 08:26:24.098 | INFO | main:init:76 - + config: configs/quartznet12x1_vi.yaml
2023-08-20 08:26:24.098 | INFO | main:init:77 - + encoder_checkpoint: models/acoustic_model/vietnamese/JasperEncoder-STEP-289936.pt
2023-08-20 08:26:24.098 | INFO | main:init:78 - + decoder_checkpoint: models/acoustic_model/vietnamese/JasperDecoderForCTC-STEP-289936.pt
2023-08-20 08:26:24.098 | INFO | main:init:79 - + lm_path: models/language_model/3-gram-lm.binary
2023-08-20 08:26:24.098 | INFO | main:init:80 - + lm_alpha: 0.5
2023-08-20 08:26:24.098 | INFO | main:init:81 - + lm_beta: 1.5
2023-08-20 08:26:24.098 | INFO | main:init:82 - + device: cpu
2023-08-20 08:26:24.098 | INFO | main:init:83 - ========================
[NeMo I 2023-08-20 08:26:24 features:149] PADDING: 0
[NeMo I 2023-08-20 08:26:24 features:170] STFT using torch
Traceback (most recent call last):
File "/opt/viet-asr/infer.py", line 186, in
vietasr = VietASR(
File "/opt/viet-asr/infer.py", line 102, in init
data_preprocessor = nemo_asr.AudioToMelSpectrogramPreprocessor(
File "/opt/viet-asr/nemo/collections/asr/audio_preprocessing.py", line 352, in init
self.featurizer = FilterbankFeatures(
File "/opt/viet-asr/nemo/collections/asr/parts/features.py", line 200, in init
librosa.filters.mel(sample_rate, self.n_fft, n_mels=nfilt, fmin=lowfreq, fmax=highfreq,),
TypeError: mel() takes 0 positional arguments but 2 positional arguments (and 3 keyword-only arguments) were given

root@voice-dev-01:/opt/viet-asr# pip3 list | grep librosa
librosa 0.10.1

Em không cài được ctc_decoders

Bác hướng dẫn e với ModuleNotFoundError: BeamSearchDecoderWithLM requires the installation of ctc_decoders from nemo/scripts/install_decoders.py. Chạy install_decoders.sh vẫn ko được bác ạ

No python-datautil

Không tìm thấy gói thư viện python-datautil, liệu có phải là python-dateutil nhưng do gõ nhầm?

Cảm ơn,
Tuấn