mirishkarganesh Goto Github PK

followers: 10.0 following: 0.0 repos: 34.0 gists: 0.0

Name: Mirishkar S Ganesh

Type: User

Company: IIIT Hyderabad

Bio: PhD scholar at IIIT Hyderabad, India. Working on Automatic Speech Recognition and Speech Enhancement

Location: Hyderabad

Mirishkar S Ganesh's Projects

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

eesen

The official repository of the Eesen project

end-to-end-asr-pytorch

This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.

espnet

End-to-End Speech Processing Toolkit

espresso

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

factorized-tdnn

PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi

flashlight

A C++ standalone library for machine learning

ganesh_w2v

icon_submission

Multilingual

indic-punct

indictrans2

Translation models for 22 scheduled languages of India

inference

Reference implementations of inference benchmarks

k0

k2

FSA/FST algorithms, intended to (eventually) be interoperable with PyTorch and similar

kaldi

This is the official location of the Kaldi project.

kaldi-onnx

Kaldi model converter to ONNX

kaldipdnn

Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN

learning_invariances_in_speech_recognition

In this work I investigate the speech command task developing and analyzing deep learning models. The state of the art technology uses convolutional neural networks (CNN) because of their intrinsic nature of learning correlated represen- tations as is the speech. In particular I develop different CNNs trained on the Google Speech Command Dataset and tested on different scenarios. A main problem on speech recognition consists in the differences on pronunciations of words among different people: one way of building an invariant model to variability is to augment the dataset perturbing the input. In this work I study two kind of augmentations: the Vocal Tract Length Perturbation (VTLP) and the Synchronous Overlap and Add (SOLA) that locally perturb the input in frequency and time respectively. The models trained on augmented data outperforms in accuracy, precision and recall all the models trained on the normal dataset. Also the design of CNNs has impact on learning invariances: the inception CNN architecture in fact helps on learning features that are invariant to speech variability using different kind of kernel sizes for convolution. Intuitively this is because of the implicit capability of the model on detecting different speech pattern lengths in the audio feature.

mirishkarganesh Goto Github PK

Mirishkar S Ganesh's Projects

Recommend Projects

Recommend Topics

Recommend Org