Giter Club home page Giter Club logo

viet-asr's Introduction

VietASR (Vietnamese Automatic Speech Recognition)


⚡ Some experiment with NeMo

Model: QuartzNet is a smaller version of Jaser model

The pretrained model on this repo was trained with ~100 hours Vietnamese speech dataset, was collected from youtube, radio, call center(8k), text to speech data and some public dataset (vlsp, vivos, fpt). It is very small model (13M parameters) make it inference so fast ⚡

🌱 Update: The new version available on branch v2.0 is built from scratch with PyTorch

Installation

  • Update & install linux libs:
apt-get update && apt-get install -y libsndfile1 ffmpeg
  • Python libs:
pip install -r requirements.txt
# cpu only, you can install CUDA version if you have NVidia GPU
pip install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
  • Install kemlm for LM decoding (only support Linux)
pip install https://github.com/kpu/kenlm/archive/master.zip

Transcribe audio file

python infer.py audio_samples # will transcribe audio file in folder: audio_samples

Run web application

  • Run app:
python app.py # app will run on address: https://localhost:5000

App

Video demo

TODO

  • Conformer Model
  • Data augumentation: speed, noise, pitch shift, time shift,...
  • FastAPI
  • Add Dockerfile

Citation

  @article{kuchaiev2019nemo,
    title={Nemo: a toolkit for building ai applications using neural modules},
    author={Kuchaiev, Oleksii and Li, Jason and Nguyen, Huyen and Hrinchuk, Oleksii and Leary, Ryan and Ginsburg, Boris and Kriman, Samuel and Beliaev, Stanislav and Lavrukhin, Vitaly and Cook, Jack and others},
    journal={arXiv preprint arXiv:1909.09577},
    year={2019}
  }

viet-asr's People

Contributors

dangvansam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

viet-asr's Issues

Requirement.txt

Hi ban,

Ban co the cung cap file requirement.txt khong?

Best reagrds,

PeterPham

Would you consider adding the MIT license?

Hi,

I find your project very helpful. Would you consider making it open to the community? Would you consider changing the license to something like MIT?

Best regards,
detrin

Unigrams

Mình chạy đc rồi nhưng nó hiện ntn. Bạn biết tại sao ko ạ.
Unigrams not provided and cannot be automatically determined from LM file (only arpa format). Decoding accuracy might be reduced.

Mô hình

Cho mình hỏi chút, bạn đang dùng mô hình jasper của Nvidia à? demo này train bao nhiêu giờ dữ liệu và wer đang bao nhiêu vậy bạn?
Thanks!

No module named 'g2pNp2g_simple'

Khi thực hiện run repos, tôi đã gặp một lỗi No module named 'g2pNp2g_simple'
Tôi phải làm như thế nào để giải quyết vấn đề này đây.

Lỗi với librosa !

Sau khi cài đặt, tôi chạy thử vào báo lỗi với librosa như ở dưới, tôi đang dùng librosa phiến bản 0.10.1

Cảm ơn bạn,
Tuấn

root@voice-dev-01:/opt/viet-asr# python3 infer.py audio_samples
################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

2023-08-20 08:26:24.098 | INFO | main::179 - transcribe audio file in : audio_samples
2023-08-20 08:26:24.098 | INFO | main:init:74 - Init VietASR with params:
2023-08-20 08:26:24.098 | INFO | main:init:75 - ========================
2023-08-20 08:26:24.098 | INFO | main:init:76 - + config: configs/quartznet12x1_vi.yaml
2023-08-20 08:26:24.098 | INFO | main:init:77 - + encoder_checkpoint: models/acoustic_model/vietnamese/JasperEncoder-STEP-289936.pt
2023-08-20 08:26:24.098 | INFO | main:init:78 - + decoder_checkpoint: models/acoustic_model/vietnamese/JasperDecoderForCTC-STEP-289936.pt
2023-08-20 08:26:24.098 | INFO | main:init:79 - + lm_path: models/language_model/3-gram-lm.binary
2023-08-20 08:26:24.098 | INFO | main:init:80 - + lm_alpha: 0.5
2023-08-20 08:26:24.098 | INFO | main:init:81 - + lm_beta: 1.5
2023-08-20 08:26:24.098 | INFO | main:init:82 - + device: cpu
2023-08-20 08:26:24.098 | INFO | main:init:83 - ========================
[NeMo I 2023-08-20 08:26:24 features:149] PADDING: 0
[NeMo I 2023-08-20 08:26:24 features:170] STFT using torch
Traceback (most recent call last):
File "/opt/viet-asr/infer.py", line 186, in
vietasr = VietASR(
File "/opt/viet-asr/infer.py", line 102, in init
data_preprocessor = nemo_asr.AudioToMelSpectrogramPreprocessor(
File "/opt/viet-asr/nemo/collections/asr/audio_preprocessing.py", line 352, in init
self.featurizer = FilterbankFeatures(
File "/opt/viet-asr/nemo/collections/asr/parts/features.py", line 200, in init
librosa.filters.mel(sample_rate, self.n_fft, n_mels=nfilt, fmin=lowfreq, fmax=highfreq,),
TypeError: mel() takes 0 positional arguments but 2 positional arguments (and 3 keyword-only arguments) were given

root@voice-dev-01:/opt/viet-asr# pip3 list | grep librosa
librosa 0.10.1

Em không cài được ctc_decoders

Bác hướng dẫn e với ModuleNotFoundError: BeamSearchDecoderWithLM requires the installation of ctc_decoders from nemo/scripts/install_decoders.py. Chạy install_decoders.sh vẫn ko được bác ạ

No python-datautil

Không tìm thấy gói thư viện python-datautil, liệu có phải là python-dateutil nhưng do gõ nhầm?

Cảm ơn,
Tuấn

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.